Video Caption using S2VT

Requirements

For running my code and reproducing the results, the following packages need to be installed first. I have used Python 3.6 for the whole of this project.

PyTorch
Caffe
NumPy
cv2
imageio
scikit-image

Running instructions

1.Install all the packages mentioned in the 'Requirements' section for the smooth running of this project.
2.Download the MSVD dataset to Data/YouTubeClips
3.Change all the path in these python files to point to directories in your workspace
4.Run extract_feats.py to extract the RGB features of videos
5.Run train.py to train the model
6.Run test.py to generate the caption of test videos

or you can directly extract features from a video and generate captions using test_from_video.py

DataSet

You can download the MSVD dataset here
You can download the extracted video features at Features_VGG, and unzip it to "Data/Features_VGG"

Result

The val.json is the grundtruth of test dataset, the result.json is the generate result. We use the metrics of WangLei(https://github.com/wangleihitcs/CaptionMetrics)

Acknowledgement

Some code copy from vijayvee(https://github.com/vijayvee/video-captioning)

YiyongHuang / S2VT