VideoNetworks / LAPS-transformer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VideoNetworks (Version 2)

This is official implementation (ver2.0) of VideoNet, containing the following models:

  • 2022 ACM Multimedia Oral Paper "LAPS: Long-term Leap Attention, short-term Periodic Shift for Video Classification" Paper link.
  • 2021 ACM Multimedia Paper "Token Shifit Transformer for Video Classification". We build an a pure-convolutional free video transformer and achieve SOTA performance 80.53% on Kinetics-400 Paper link.

Updates

Sept. 16, 2022

  • Release LAPS to public.
  • Release this the 2nd version of TokShift to public.
  • Ver2 includes the following modifications:
    • Directly decode video mp4 file during training/evaluation
    • Change to adopt standarlize timm code-base.
    • Performances of TokShift are further improved than reported in paper version (average +0.5).

Model Zoo and Baselines

We present experimental comparion between verion 1.0 and 2.0 in Model Zoo

Installation

conda create -n tokshift python=3.7
conda activate tokshift
On V100:
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
On A100:
pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio===0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
pip install -r requirements (or pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements)

Quick Start

Training

  1. Donwload Kinetics-400 datasets
  2. Re-organize the Kinetics-400 dataets in the following structure. (structure files), use rename.py to remove space " " in category name.
  3. Set-up Wandb according to https://docs.wandb.ai/quickstart#1.-set-up-wandb
kinetics400_mmlab
|_train.csv
|_val.csv
|_test.csv
|_cate_dict.json
|_process
|_val
|   |_[category name 0]
|   |   |_[video name 0].mp4
|   |   |_[video name 1].mp4
|   |   |_[video name 2].mp4
|   |   |_...
|   |_[category name 2]
|   |   |_[video name 0].mp4
|   |   |_[video name 1].mp4
|   |   |_[video name 2].mp4
|   |   |_...
|   |_...
|_train
    |_[category name 0]
    |   |_[video name 0].mp4
    |   |_[video name 1].mp4
    |   |_[video name 2].mp4
    |   |_...
    |_[category name 2]
    |   |_[video name 0].mp4
    |   |_[video name 1].mp4
    |   |_[video name 2].mp4
    |   |_...
    |_...
  1. Use train script (train.sh under ./scripts) to trian k400
#!/usr/bin/env python
import os

# Visformer_LAPS_8x8
cmd = "python -u main_tokshift.py \
               --multiprocessing-distributed --world-size 1 --rank 0 \
               --dist-ur tcp://127.0.0.1:23677 \
 	       --tune_from pretrain/visformer_s_in10k.pth \
               --cfg_file config/custom/k400/visformer/visformer_LAPS_8x8.yaml"
os.system(cmd)

cmd = "python -u main_tokshift.py \
               --multiprocessing-distributed --world-size 1 --rank 0 \
               --dist-ur tcp://127.0.0.1:23677 \
               --cfg_file config/custom/k400/tokshift_8x32_b16.yaml"
os.system(cmd)

Test with eval.sh

#!/usr/bin/env python
import os

cmd = "python -u main_tokshift.py \
               --multiprocessing-distributed --world-size 1 --rank 0 \
               --dist-ur tcp://127.0.0.1:23677 \
               --eval \
               --resume checkpoints/TokShift_vit_base_patch16_224_in21k_kinetics_C400_8x32_E18_LR0.06_B6_S224/best_ckpt_e17.pth \
               --cfg_file config/custom/k400/tokshift_8x32_b16.yaml"
os.system(cmd)

Contributors

VideoNet is written and maintained by Dr. Hao Zhang and Dr. Yanbin Hao.

Citing

If you find TokShift-xfmr is useful in your research, please use the following BibTeX entry for citation.

@inproceedings{zhang2022leap,
  title={Long-term Leap Attention, short-term Periodic Shift for Video Classification},
  author={Zhang, Hao and Cheng, Lechao and Hao, Yanbin and Ngo, Chong-Wah},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  year={2022}
}
@inproceedings{zhang2021token,
  title={Token Shift Transformer for Video Classification},
  author={Zhang, Hao and Hao, Yanbin and Ngo, Chong-Wah},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={917--925},
  year={2021}
}

Acknowledgement

Thanks for the following Github projects:

About


Languages

Language:Python 100.0%