S2VT: Sequence to Sequence: Video to Text

Acknowledgement

I modified the code from jazzsaxmafia, and I have fixed some problems in his code.

$ python extract_feats.py

After this operation, you should split the features into two parts:

$ CUDA_VISIBLE_DEVICES=0 ipython

When in the ipython environment, then:

>>> import model_rgb
>>> model_rgb.train()

You should change the training parameters and directory path in the model_rgb.py

>>> import model_rgb
>>> model_rgb.test()

After testing, a text file, "S2VT_results.txt" will generated.

We evaluate the generation results with coco-caption tools.

You can run the shell get_coco_tools.sh get download the coco tools:

$ ./get_coco_tools.sh

After this, generate the reference json file from ground truth CSV file:

$ python create_reference.py

Then, generate the results json file from S2VT_results.txt file:

$ python create_result_json.py

Finally, you can evaluate the generation results:

$ python eval.py

Please feel free to ask me if you have questions.
I only commit the RGB parts of all my code, you can modify the code to use optical flow features.

Tensorflow implement of paper: Sequence to Sequence: Video to Text

Language:Python 99.7%Language:Shell 0.3%