llSourcell / tensorflow_chatbot

Tensorflow chatbot demo by @Sirajology on Youtube

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

neuralconvo.ini + missing data/train.enc

johndpope opened this issue · comments

the neuralconvo.ini specifies following files

[strings]

Mode : train, test, serve

mode = train
train_enc = data/train.enc
train_dec = data/train.dec
test_enc = data/test.enc
test_dec = data/test.enc

but there is no data folder in repo.
there is the working_dir

python3 execute.py

Mode : train

Preparing data in working_dir/
Tokenizing data in data/train.enc
Traceback (most recent call last):
File "execute.py", line 313, in
train()
File "execute.py", line 127, in train
enc_train, dec_train, enc_dev, dec_dev, _, _ = data_utils.prepare_custom_data(gConfig['working_directory'],gConfig['train_enc'],gConfig['train_dec'],gConfig['test_enc'],gConfig['test_dec'],gConfig['enc_vocab_size'],gConfig['dec_vocab_size'])
File "/Users/johndpope/Documents/gitWorkspace/tensorflow_chatbot/data_utils.py", line 137, in prepare_custom_data
data_to_token_ids(train_enc, enc_train_ids_path, enc_vocab_path, tokenizer)
File "/Users/johndpope/Documents/gitWorkspace/tensorflow_chatbot/data_utils.py", line 121, in data_to_token_ids
normalize_digits)
File "/Users/johndpope/Documents/gitWorkspace/tensorflow_chatbot/data_utils.py", line 100, in sentence_to_token_ids
words = basic_tokenizer(sentence)
File "/Users/johndpope/Documents/gitWorkspace/tensorflow_chatbot/data_utils.py", line 50, in basic_tokenizer
words.extend(re.split(_WORD_SPLIT, space_separated_fragment))
File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/re.py", line 203, in split
return _compile(pattern, flags).split(string, maxsplit)
TypeError: cannot use a bytes pattern on a string-like object

The gitignore excluded his data folder from being checked in.

How do you create the .enc files from the movie-dialogs corpus he put in the readme ?

EDIT : Okay got them, https://github.com/suriyadeepan/datasets/tree/master/seq2seq/cornell_movie_corpus/

doh, can't perform git-lfs check for the repo properly because the account is over quota:
"This repository is over its data quota. Purchase more data packs to restore access"

To rebuild them:
mkdir tensorflow_chatbot/data
cd tensorflow_chatbot/data
Get https://people.mpi-sws.org/~cristian/data/cornell_movie_dialogs_corpus.zip, put the *.txt files in this new data/ dir.
git clone https://github.com/suriyadeepan/datasets.git
Edit datasets-master/seq2seq/cornell_movie_corpus/scripts/prepare_data.py and uncomment the last lines so prepare_seq2seq_files executes.
python datasets-master/seq2seq/cornell_movie_corpus/scripts/prepare_data.py
This makes {train,test}.{enc,dec}

These files are just lines of text. I guess matching line numbers between enc and dec are conversation pairs.

I had to grab that script manually as I couldn't checkout the repo as mentioned. It appears to be training now.

Same here, had to correct a few issues with python 3.5 and the use of re.split that caused errors.

What was the re.split issue?

it is a problem with this line

line 8, in get_id2line:
lines=open("movie_lines.txt").read().split('\n')

it gives this error and will not let the file execute:
line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 585399: invalid start byte

how did you guys fix it?

got it you have to run the prepare_data.py in python 2.7

you have to also uncomment the last lines so prepare_seq2seq_files executes.

and make sure you have a python 2.7 environment for the prepare_data.py and remember the assignment code is in python 2.7 as well.

@shlomis The re.split issue was about 're' not able to apply a byte rule to a string while training. Had to replace the line 50 of data_utils.py :
words.extend(re.split(_WORD_SPLIT, space_separated_fragment))
with
try: words.extend(re.split(_WORD_SPLIT, str.encode(space_separated_fragment))) except: words.extend(re.split(_WORD_SPLIT, space_separated_fragment))
as if you just .encode the string you'll get the reverse error while testing

@Niko2756 I am able to run it on Python 3 with a minimum amount of tweaks (a few print to change and maybe a small error message)

I think this could be related.
suriyadeepan/datasets#1

Here is my fix for python 3:

def get_id2line():
    lines=open('movie_lines.txt', encoding='utf-8', errors='ignore')
    lines = lines.read()
    lines = lines.split('\n')
    id2line = {}
    for line in lines:
        _line = line.split(' +++$+++ ')
        if len(_line) == 5:
            id2line[_line[0]] = _line[4]
    return id2line

and of course change print to print()

@drewp - sorry, python newbie here. I get this error when ran prepare_data.py. any ideas?

File "datasets/seq2seq/cornell_movie_corpus/scripts/prepare_data.py", line 89
    print '\n>> written %d lines' %(i)
                                ^
SyntaxError: invalid syntax

UPDATE - I switched to python2 and it ran successfully.

commented

I have been running the training for a week almost. currently
global step 253800 learning rate 0.1249 step-time 0.46 perplexity 1.00
eval: bucket 0 perplexity 1936.44
eval: bucket 1 perplexity 2054.61
eval: empty bucket 2
eval: empty bucket 3
does it end ever?

No, It doesn't end... it's an infinite loop. you can stop it at any time and it should pick up the latest checkpoint.

It doesn't end however your results aren't correct. Check if you're running the good Python and tensorflow version. You should also check in the .ini file if the train_dec line is correctly set to a .dec file.

commented

I am running with python 2 i think, I did not know it was infinite loop. Thank you. And I checked .ini file, looks fine. Also on test mode, how are you supposed to use it? can i still use it as chatbot (like talk to it) what are the output about?

commented

Can I test my chatbot, im getting this, why is it doing it?
python execute.py

Mode : test

Reading model parameters from working_dir/seq2seq.ckpt-266100

hi
size _UNK
hello
size _UNK

I trained the using the default values in seq2seq.ini file as below after checkpoint at 16200(Reading model parameters from working_dir/seq2seq.ckpt-16200), but always getting responses as

_UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK _UNK

[strings]
mode = train
train_enc = data/train.enc
train_dec = data/train.dec
test_enc = data/test.enc
test_dec = data/test.enc

working_directory = working_dir/

[ints]

enc_vocab_size = 20000
dec_vocab_size = 20000

num_layers = 3

layer_size = 256

max_train_data_size = 0
batch_size = 64
steps_per_checkpoint = 300

[floats]
learning_rate = 0.5
learning_rate_decay_factor = 0.99
max_gradient_norm = 5.0

Any help would be appreciated. Thanks.

Have anyone tried training neuralConvo.ini model ? How to test this ? should we enter the questions listed from test.enc and expect predicted output from test.dec ? or is there any other way out to test this ?

commented

Did you read the introduction tutorial, @hariom-yadaw, at tensorflow https://www.tensorflow.org/tutorials/seq2seq/ they explain roughly the .ini settings.

I had the UNK issue as well, and could not get to a stage, where you would have something like a conversational experience, no matter if had a few or a lot (> 2hrs) of training iterations.

@2075 Yes, I had gone through the tutorial. I also trained it overnight(> 12 hrs), but UNK issue is always there. Can you please explain how to overcome this issue ? Thanks

I only get "facing Klein Chub Chub Chub Chub Strip Strip Strip Strip" :(

commented

@hariom-yadaw In my case with 3 layers and 256 I get kind of usable results after more than a day training. Before I tried smaller layer sizes, less and more layers, but my dual GTX 690 quad SLI setup cannot crunch all of it. As long as your training sources are well, you should get some kind of result and the longer I train, the less UNK replies come back, though it is far from a real conversation, but worthwhile test.

@2075 I was following below details about LSTM networks.
http://colah.github.io/posts/2015-08-Understanding-LSTMs/

I'm not very sure what num_layers =3 & layer_size = 256 refers to here. I want to play with these parameter which are related to network size, but don't have clear understanding of these ? can you (or anyone else please) explain these and how it affects the performance. Thanks!

@2075 I have tried all the solutions, I'm still getting only _UNKs. What python version ( 2 or 3) and tensorflow, are you using ? If possible, can you please share the working version ? Thanks.

@2075 @hariom-yadaw
I trained it for more than a day with 3 layers and 256 layer size, still i get result as _UNK _UNK ...
I cloned the repo from here but as it did not include test.dec, test.enc, train.dec, train.enc I downloaded it separately from dropbox as mentioned above, also I have python 3.5.2 and tensorflow version 0.12. Can please anyone let me know what I am missing on as I am not getting any clue now. Thanks.

Hello

I am trying to make a chatbot in tensorflow. I clone the code from github and when i try to run the execute.py file then i get this error.

E:\python\tensorflow_chatbot-master>python execute.py

Mode : train

Preparing data in working_dir/
Tokenizing data in data/train.enc
Traceback (most recent call last):
File "execute.py", line 319, in
train()
File "execute.py", line 127, in train
enc_train, dec_train, enc_dev, dec_dev, _, _ = data_utils.prepare_custom_dat
a(gConfig['working_directory'],gConfig['train_enc'],gConfig['train_dec'],gConfig
['test_enc'],gConfig['test_dec'],gConfig['enc_vocab_size'],gConfig['dec_vocab_si
ze'])
File "E:\python\tensorflow_chatbot-master\data_utils.py", line 137, in prepare
_custom_data
data_to_token_ids(train_enc, enc_train_ids_path, enc_vocab_path, tokenizer)
File "E:\python\tensorflow_chatbot-master\data_utils.py", line 112, in data_to
_token_ids
vocab, _ = initialize_vocabulary(vocabulary_path)
File "E:\python\tensorflow_chatbot-master\data_utils.py", line 87, in initiali
ze_vocabulary
rev_vocab.extend(f.readlines())
File "C:\Python\Python35\lib\site-packages\tensorflow\python\lib\io\file_io.py
", line 131, in readlines
s = self.readline()
File "C:\Python\Python35\lib\site-packages\tensorflow\python\lib\io\file_io.py
", line 124, in readline
return compat.as_str_any(self._read_buf.ReadLineAsString())
File "C:\Python\Python35\lib\site-packages\tensorflow\python\util\compat.py",
line 106, in as_str_any
return as_str(value)
File "C:\Python\Python35\lib\site-packages\tensorflow\python\util\compat.py",
line 84, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1: invalid
start byte

I have seen one comment above with the same error code and i will run the prepare_data.py as well..but even after...i am getting the same error.

Can anyone please help where am i doing wrong.
Thanks in advance

I am running into the same difficulties as @krati23 glad to see there is a common issue

I figured it out, Go to "https://github.com/suriyadeepan/datasets/blob/master/seq2seq/cornell_movie_corpus/pull_data.sh" and download all the filed then in the seq2seq.ini change the filed paths to those paths you just downloaded

I am having issues with the tensorflow chatbot and was wondering if I could get pointed in the right direction. when running the execute.py I get the error

Mode : train

Traceback (most recent call last):
File "C:/Users/jonsa/Desktop/tensorflow_chatbot-master/execute.py", line 320, in
train()
File "C:/Users/jonsa/Desktop/tensorflow_chatbot-master/execute.py", line 138, in train
model = create_model(sess, False)
File "C:/Users/jonsa/Desktop/tensorflow_chatbot-master/execute.py", line 105, in create_model
model = seq2seq_model.Seq2SeqModel( gConfig['enc_vocab_size'], gConfig['dec_vocab_size'], buckets, gConfig['layer_size'], gConfig['num_layers'], gConfig['max_gradient_norm'], gConfig['batch_size'], gConfig['learning_rate'], gConfig['learning_rate_decay_factor'], forward_only=forward_only)
File "C:\Users\jonsa\Desktop\tensorflow_chatbot-master\seq2seq_model.py", line 165, in init
softmax_loss_function=softmax_loss_function)
File "C:\Users\jonsa\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\legacy_seq2seq\python\ops\seq2seq.py", line 1201, in model_with_buckets
decoder_inputs[:bucket[1]])
File "C:\Users\jonsa\Desktop\tensorflow_chatbot-master\seq2seq_model.py", line 164, in
lambda x, y: seq2seq_f(x, y, False),
File "C:\Users\jonsa\Desktop\tensorflow_chatbot-master\seq2seq_model.py", line 128, in seq2seq_f
feed_previous=do_decode)
File "C:\Users\jonsa\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\legacy_seq2seq\python\ops\seq2seq.py", line 855, in embedding_attention_seq2seq
encoder_cell, encoder_inputs, dtype=dtype)
File "C:\Users\jonsa\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\rnn\python\ops\core_rnn.py", line 197, in static_rnn
(output, state) = call_cell()
File "C:\Users\jonsa\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\rnn\python\ops\core_rnn.py", line 184, in
call_cell = lambda: cell(input
, state)
File "C:\Users\jonsa\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\rnn\python\ops\core_rnn_cell_impl.py", line 881, in call
return self._cell(embedded, state)
File "C:\Users\jonsa\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\rnn\python\ops\core_rnn_cell_impl.py", line 953, in call
cur_inp, new_state = cell(cur_inp, cur_state)
File "C:\Users\jonsa\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\rnn\python\ops\core_rnn_cell_impl.py", line 146, in call
with _checked_scope(self, scope or "gru_cell", reuse=self._reuse):
File "C:\Users\jonsa\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 59, in enter
return next(self.gen)
File "C:\Users\jonsa\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\rnn\python\ops\core_rnn_cell_impl.py", line 77, in _checked_scope
type(cell).name))
ValueError: Attempt to reuse RNNCell <tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl.GRUCell object at 0x0000014886DF21D0> with a different variable scope than its first use. First use of cell was with scope 'embedding_attention_seq2seq/rnn/multi_rnn_cell/cell_0/gru_cell', this attempt is with scope 'embedding_attention_seq2seq/rnn/multi_rnn_cell/cell_1/gru_cell'. Please create a new instance of the cell if you would like it to use a different set of weights. If before you were using: MultiRNNCell([GRUCell(...)] * num_layers), change to: MultiRNNCell([GRUCell(...) for _ in range(num_layers)]). If before you were using the same cell instance as both the forward and reverse cell of a bidirectional RNN, simply create two instances (one for forward, one for reverse). In May 2017, we will start transitioning this cell's behavior to use existing stored weights, if any, when it is called with scope=None (which can lead to silent model degradation, so this error will remain until then.)

Process finished with exit code 1

I also tried the suggested corrections and still nothing

@jonsanti Check out the issue I started: #34
I gave the solution there. all you need to do is use a specific version of tensorflow in your virtual environment. As far as I understood they have made few changes and because of that it does not work properly. my solution should fix the problem, let me know if it doesnt.

i am facing below error could you please help me to fix it.

image

I fix the tensorflow.model import error downloading the models module of tensorflow and changing the reference to "tensorflow.models.tutorials.rnn" it is the correct path.

@drewp But there is no movie_lines.txt file.

commented

'module' object has no attribute 'seq2seq'

#61 gaoshuming
maybe you should try to replace every 'tf.nn.seq2seq' by 'tf.contrib.legacy_seq2seq' in seq2seq_model.py

pywrap_tensorflow.TF_GetCode(status))

NotFoundError: NewRandomAccessFile failed to Create/Open: data/train.enc : The system cannot find the path specified.
Anyone, please let me know error

Hello Dears,
Did anybody got it working? I am curious to see Q and A :) 👍
Best!!

commented

How to solve the UNK problem?

Hi guys, do you face this problem?
RecursionError: maximum recursion depth exceeded
Thank you if you guys can help. Appreciate.

crossent = softmax_loss_function(labels=target, logits=logit)

TypeError: sampled_loss() got an unexpected keyword argument 'logits'
anyone knows a fix?

Hi i am getting this error?can someone help me regarding this
Preparing data in working_dir/
Tokenizing data in data/train.enc
tokenizing line 100000
Tokenizing data in data/train.dec
tokenizing line 100000
Tokenizing data in data/test.enc
2018-02-07 13:56:48.209757: W d:\nwani\l\tensorflow_1498062690615\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions
, but these are available on your machine and could speed up CPU computations.
2018-02-07 13:56:48.210757: W d:\nwani\l\tensorflow_1498062690615\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instruction
s, but these are available on your machine and could speed up CPU computations.
2018-02-07 13:56:48.210757: W d:\nwani\l\tensorflow_1498062690615\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instruction
s, but these are available on your machine and could speed up CPU computations.
2018-02-07 13:56:48.210757: W d:\nwani\l\tensorflow_1498062690615\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructi
ons, but these are available on your machine and could speed up CPU computations.
2018-02-07 13:56:48.210757: W d:\nwani\l\tensorflow_1498062690615\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructi
ons, but these are available on your machine and could speed up CPU computations.
2018-02-07 13:56:48.210757: W d:\nwani\l\tensorflow_1498062690615\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions
, but these are available on your machine and could speed up CPU computations.
2018-02-07 13:56:48.211757: W d:\nwani\l\tensorflow_1498062690615\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instruction
s, but these are available on your machine and could speed up CPU computations.
2018-02-07 13:56:48.211757: W d:\nwani\l\tensorflow_1498062690615\work\tensorflow-1.1.0\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions
, but these are available on your machine and could speed up CPU computations.
Creating 3 layers of 256 units.
Traceback (most recent call last):
File "execute.py", line 319, in
train()
File "execute.py", line 137, in train
model = create_model(sess, False)
File "execute.py", line 104, in create_model
model = seq2seq_model.Seq2SeqModel( gConfig['enc_vocab_size'], gConfig['dec_vocab_size'], _buckets, gConfig['layer_size'], gConfig['num_layers'], gConfig['max_gradient_norm'], gConfig['batch_size'
], gConfig['learning_rate'], gConfig['learning_rate_decay_factor'], forward_only=forward_only)
File "C:\Users\hthakare\python\tensorflow_chatbot-master\seq2seq_model.py", line 106, in init
single_cell = tf.nn.rnn_cell.GRUCell(size)
AttributeError: module 'tensorflow.python.ops.nn' has no attribute 'rnn_cell'

On windows you can fix this by getting a peice of software called MiniConda, installing it on your system and then creating a 2.7 environment to run the importer for the movie pack.

I'm able to run python execute.py after making changes to data_utils.py and seq2seq_model.py
corrected file are available here:
https://github.com/llSourcell/tensorflow_chatbot/pull/77/files

It's started training the model and testing once mode is changed to test

Thanks to Chrisfauerbach for correction

Hi guys, can someone help me fix this problem: >> Mode : test

Traceback (most recent call last):
File "execute.py", line 324, in
decode()
File "execute.py", line 220, in decode
enc_vocab, _ = data_utils.initialize_vocabulary(enc_vocab_path)
File "D:\My_document\AI\Chatbot_Conversation\tensorflow_chatbot-master\data_utils.py", line 86, in initialize_vocabulary
rev_vocab.extend(f.readlines())
File "C:\Users\Hoang\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 131, in readlines
s = self.readline()
File "C:\Users\Hoang\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 124, in readline
return compat.as_str_any(self._read_buf.ReadLineAsString())
File "C:\Users\Hoang\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\util\compat.py", line 106, in as_str_any
return as_str(value)
File "C:\Users\Hoang\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\util\compat.py", line 84, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1: invalid start byte

@uccmen you have to change print '\n>> written %d lines' %(i) to print('\n>> written %d lines' %(i)') in python 3.x

I get the below error how should i fix this error

File "D:\Anaconda\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))

NotFoundError: NewRandomAccessFile failed to Create/Open: data/train.enc : The system cannot find the path specified.

Mode : train

Preparing data in working_dir/
2018-08-12 13:52:28.476005: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Creating 3 layers of 256 units.
Traceback (most recent call last):
File "execute.py", line 319, in
train()
File "execute.py", line 137, in train
model = create_model(sess, False)
File "execute.py", line 104, in create_model
model = seq2seq_model.Seq2SeqModel( gConfig['enc_vocab_size'], gConfig['dec_vocab_size'], _buckets, gConfig['layer_size'], gConfig['num_layers'], gConfig['max_gradient_norm'], gConfig['batch_size'], gConfig['learning_rate'], gConfig['learning_rate_decay_factor'], forward_only=forward_only)
File "/home/mark/tensorflow_chatbot/seq2seq_model.py", line 154, in init
self.outputs, self.losses = tf.nn.seq2seq.model_with_buckets(
AttributeError: module 'tensorflow.nn' has no attribute 'seq2seq'

anyone can help??

I managed to fix Most of the issues with the code by looking through most of this forum and doing some of my own research , My bot seems to have named him self Alexander . I have trained him up till a Perplexity of ~8 , it been around 12 hours of training .

I am using all the latest versions , Anaconda3 , Python 3.7 etc .

I have even done some fixes on UI which was throwing configuration parser errors . It seems to be working fine . Alexander is still young he would need to be trained for another 15 hours approx.
A003E1C0-5626-471D-98A4-84BD621A66AC

Reply below if you would want me to Push changes !

crossent = softmax_loss_function(labels=target, logits=logit)

TypeError: sampled_loss() got an unexpected keyword argument 'logits'
anyone knows a fix?

Following change fixed it for me:

#Fix for Error: TypeError: sampled_loss() got an unexpected keyword argument 'logits'
def sampled_loss(labels, logits):
labels = tf.reshape(labels, [-1, 1])
return tf.nn.sampled_softmax_loss(w_t, b, labels, logits, num_samples, self.target_vocab_size)

#def sampled_loss(inputs, labels):
#labels = tf.reshape(labels, [-1, 1])
#return tf.nn.sampled_softmax_loss(w_t, b, inputs, labels, num_samples,
#self.target_vocab_size)

Dear Friends, Gurus and Experts,

I have one general question about creating a bot using this code. In this exercise we have a knowledge set with train.enc having one part of the conversation and train.dec having the other part (replies) of the conversation to train the bot.
My doubt is how does the model relate a reply in train.dec with its corresponding question in train.enc dataset?
Actually, I am trying to map the same code for developing a customer support bot for my college project. Here I have set of FAQs as my knowledge database. I have taken all the questions in FAQs as train.enc set and all the answers as train.dec set. But in this case questions and answers have a kind of tight coupling. In this case, how can I maintain this relevance of answers with questions in my model?

Any help or a pointer in this regard will be very much appreciated.