cqkmxpr / PDVC

End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PDVC

Code for End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021) [arxiv]

pdvc.png

Preparation

Environment: Linux, GCC>=5.4, CUDA >= 9.2, Python>=3.7, PyTorch>=1.5.1,

  1. Clone the repo
git clone --recursive https://github.com/ttengwang/PDVC.git
  1. Create vitual environment by conda
conda create -n PDVC python=3.7; 
source activate PDVC; 
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
pip install -r requirement.txt
  1. Prepare the video features.
cd data/anet/features
bash download_anet_c3d.sh
bash download_anet_tsn.sh
  1. Compile the deformable attention layer (requires gcc >= 5.4).
cd models/ops
sh make.sh

Usage

PDVC

  • Training
python train.py --cfg_path cfgs/anet_c3d_pdvc.yml --gpu_id ${GPU_ID}

The script will print the log and evaluate the model every epoch.

  • Evaluation
python eval.py --eval_folder ${eval_folder} --eval_transformer_input_type queries --gpu_id ${GPU_ID}

PDVC with gt proposals

  • Training
python train.py --cfg_path cfgs/anet_c3d_pdvc_gt.yml --gpu_id ${GPU_ID}
  • Evaluation
python eval.py --eval_folder ${eval_folder} --eval_transformer_input_type gt_proposals --gpu_id ${GPU_ID}

Performance

Model Features Url Recall Precision BLEU4 METEOR2018 METEOR2021 CIDEr SODA_c METEOR (Para-level)
PDVC TSN Google Drive 56.21 57.46 1.92 8.00 8.63 29.00 5.68 15.85

Some notes:

  • In the paper, we follow the most previous methods to use the evaluation tookit in ActivityNet Challenge 2018. Note that the latest evluation tookit (METEOR2021) gives a higher METEOR score.
  • Paragraph-level METEOR ar evaluated on the ActivityNet Entity ae-val set, while others are on standard ActivityNet Captions validation set.

TODO

  • more pretrained models
  • support youcook2

Citation

If you find this repo helpful, please consider citing:

@article{wang2021end,
  title={End-to-End Dense Video Captioning with Parallel Decoding},
  author={Wang, Teng and Zhang, Ruimao and Lu, Zhichao and Zheng, Feng and Cheng, Ran and Luo, Ping},
  journal={arXiv preprint},
  year={2021}
@article{wang2020dense,
  title={Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020},
  author={Wang, Teng and Zheng, Huicheng and Yu, Mingjing},
  journal={arXiv preprint arXiv:2006.11693},
  year={2020}
}

About

End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)

License:MIT License


Languages

Language:Python 79.0%Language:Cuda 18.5%Language:C++ 1.8%Language:Shell 0.7%