Generating vocabulary only from the training set
sxs4337 opened this issue · comments
sxs4337 commented
The vocabulary should be generated only using the training data.
Currently, in function-
https://github.com/tsenghungchen/SA-tensorflow/blob/master/Att.py#L370 , the input is "captions" which is generated from all data- train+val+test.
Ideally, the network should not be fed any words from the test set (any unseen new words in testing to the network should be just <unknown_word> for evaluation).
Thanks.
Paul Chen commented
Yeah, you're right. Thank you for pointing it out. I'll update the code.