adeshpande3 / Facebook-Messenger-Bot

Facebook chatbot that I trained to talk like me using Seq2Seq

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Invalid outputs

nishant260190 opened this issue · comments

I have followed all the steps as described by you everything is running perfectly but output results are not correct, may be because I have used very small data set. But still for the exact conversation it should give correct result like "hello" in response of "hi" or "fine" in response of "how are you" which I have given as input while training.

Just because something is in the training set doesn't mean that the network will learn to output that. The problem could be a couple of different things: small dataset, too complex network architecture, improperly tuned hyperparameters, not enough variety in dataset, etc. It's hard to pinpoint which one it could be. One exercise that may be useful is just using a very small dataset (a couple of input-output pairs) and using a very small network and seeing if the network can at least learn those mappings. Once it can, then slowly increase the size of the dataset as well as the complexity of the network.

@adeshpande3 : Thanks for the early response. I am new to this so I am not able to understand how to increase/decrease the complexity of network. I have not changed the code, it is same as given in this repository.
And one more thing on what basis I have to set hyperparameters.

Word2Vec :
wordVecDimensions = 100
batchSize = 128
numNegativeSample = 64
windowSize = 5
numIterations = 100000

numTrainingExamples : 919210 vocabSize : 5850

Seq2Seq :

batchSize = 24
maxEncoderLength = 15
maxDecoderLength = maxEncoderLength
lstmUnits = 112
embeddingDim = lstmUnits
numLayersLSTM = 3
numIterations = 70000

By decreasing the complexity of the network, I mean decreasing the number of LSTM units or the number of LSTM layers

@adeshpande3 : Can you please help me out in understanding that on what basis we have to define parameters

There isn't really an easy answer to that question. It's highly dependent on what task you're trying to solve (question/answering in our case), the type of model you're trying to create, and the amount of data/compute power you have. All these things will affect the parameter values you choose. I'd recommend watching CS 224 to get a better understanding.