recommend installing pytorch and python packages using Anaconda
- cuda
- pytorch 0.4.0
- python3
- ffmpeg (can install using anaconda)
- tqdm
- pillow
- pretrainedmodels
- nltk
MSR-VTT. Test video doesn't have captions, so I spilit train-viedo to train/val/test. Extract and put them in ./data/
directory
- train-video: download link
- test-video: download link
- json info of train-video: download link
- json info of test-video: download link
all default options are defined in opt.py or corresponding code file, change them for your like.
Some code refers to ImageCaptioning.pytorch
put kaggle data into data/5242_data and unzip it. But I have done it and preprocess the data. So when you run this project with the original dataset, you don't need to do that.
you can use video-classification-3d-cnn-pytorch to extract features from video.
- preprocess videos and labels
python prepro_feats.py
python prepro_vocab.py
python prepro_val_data.py
- Training a model
python train.py --gpu 0 --epochs 151 --batch_size 128 --checkpoint_path data/save --feats_dir data/feats/resnet152 --model S2VTAttModel --with_c3d 1 --dim_vid 2048 --max_len 5
-
test
opt_info.json will be in same directory as saved model.
python eval.py --recover_opt data/save/opt_info.json --saved_model data/save/model_50.pth --batch_size 100 --gpu 0
- predict
python predict.py --recover_opt data/save/opt_info.json --saved_model data/save/model_50.pth --batch_size 128 --gpu 0