tsenghungchen / SA-tensorflow

Soft attention mechanism for video caption generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Generating vocabulary only from the training set

sxs4337 opened this issue · comments

The vocabulary should be generated only using the training data.
Currently, in function-
https://github.com/tsenghungchen/SA-tensorflow/blob/master/Att.py#L370 , the input is "captions" which is generated from all data- train+val+test.
Ideally, the network should not be fed any words from the test set (any unseen new words in testing to the network should be just <unknown_word> for evaluation).
Thanks.

Yeah, you're right. Thank you for pointing it out. I'll update the code.