tsenghungchen / SA-tensorflow

Soft attention mechanism for video caption generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Number of epochs to reproduce paper scores

sxs4337 opened this issue · comments

I was able to write a script for data generation for MSVD.
Could you please comment on the number of epochs to run to reproduce scores as the [Yao et al. 2015 Describing Videos by Exploiting Temporal Structure] paper.
I see that in the code it is mentioned 900 epochs.
Thanks.

The number 900 is not right because I forgot to change the default number of epochs in test().
Normally, The temporal-attention model only takes about 40~80 epochs to overfit the training data. You can test on training data to see if the model overfits or not.
I just notice that I have not run the code on either MSVD or DVS. Instead, I trained and evaluated on the M-VAD [1] dataset. The meteor score of model 40 is 5.4%, which is close to the one (4.3%) reported in [2]. (However, they use GoogleNet instead of VGG)
Therefore, I have to admit that there is no guarantee for the model to reproduce scores as the original paper.

[1] Torabi et al., Using Descriptive Video Services To Create a Large Data Source For Video Annotation Research, GCPR 2015.
[2] Venugopalan et al., Sequence to Sequence – Video to Text, ICCV 2015.