VideoNetworks (Version 2)

This is official implementation (ver2.0) of VideoNet, containing the following models:

2022 ACM Multimedia Oral Paper "LAPS: Long-term Leap Attention, short-term Periodic Shift for Video Classification" Paper link.

2021 ACM Multimedia Paper "Token Shifit Transformer for Video Classification". We build an a pure-convolutional free video transformer and achieve SOTA performance 80.53% on Kinetics-400 Paper link.

Updates
Model Zoo and Baselines
Installation
Quick Start
Contributors
Citing
Acknowledgement

Updates

Sept. 16, 2022

Release LAPS to public.
Release this the 2nd version of TokShift to public.
Ver2 includes the following modifications:
- Directly decode video mp4 file during training/evaluation
- Change to adopt standarlize timm code-base.
- Performances of TokShift are further improved than reported in paper version (average +0.5).

Model Zoo and Baselines

We present experimental comparion between verion 1.0 and 2.0 in Model Zoo

Installation

conda create -n tokshift python=3.7
conda activate tokshift
On V100:
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
On A100:
pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio===0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
pip install -r requirements (or pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements)

Quick Start

Training

Donwload Kinetics-400 datasets
Re-organize the Kinetics-400 dataets in the following structure. (structure files), use rename.py to remove space " " in category name.
Set-up Wandb according to https://docs.wandb.ai/quickstart#1.-set-up-wandb

kinetics400_mmlab
|_train.csv
|_val.csv
|_test.csv
|_cate_dict.json
|_process
|_val
|   |_[category name 0]
|   |   |_[video name 0].mp4
|   |   |_[video name 1].mp4
|   |   |_[video name 2].mp4
|   |   |_...
|   |_[category name 2]
|   |   |_[video name 0].mp4
|   |   |_[video name 1].mp4
|   |   |_[video name 2].mp4
|   |   |_...
|   |_...
|_train
    |_[category name 0]
    |   |_[video name 0].mp4
    |   |_[video name 1].mp4
    |   |_[video name 2].mp4
    |   |_...
    |_[category name 2]
    |   |_[video name 0].mp4
    |   |_[video name 1].mp4
    |   |_[video name 2].mp4
    |   |_...
    |_...

Use train script (train.sh under ./scripts) to trian k400

#!/usr/bin/env python
import os

# Visformer_LAPS_8x8
cmd = "python -u main_tokshift.py \
               --multiprocessing-distributed --world-size 1 --rank 0 \
               --dist-ur tcp://127.0.0.1:23677 \
 	       --tune_from pretrain/visformer_s_in10k.pth \
               --cfg_file config/custom/k400/visformer/visformer_LAPS_8x8.yaml"
os.system(cmd)

cmd = "python -u main_tokshift.py \
               --multiprocessing-distributed --world-size 1 --rank 0 \
               --dist-ur tcp://127.0.0.1:23677 \
               --cfg_file config/custom/k400/tokshift_8x32_b16.yaml"
os.system(cmd)

Test with eval.sh

#!/usr/bin/env python
import os

cmd = "python -u main_tokshift.py \
               --multiprocessing-distributed --world-size 1 --rank 0 \
               --dist-ur tcp://127.0.0.1:23677 \
               --eval \
               --resume checkpoints/TokShift_vit_base_patch16_224_in21k_kinetics_C400_8x32_E18_LR0.06_B6_S224/best_ckpt_e17.pth \
               --cfg_file config/custom/k400/tokshift_8x32_b16.yaml"
os.system(cmd)

Contributors

VideoNet is written and maintained by Dr. Hao Zhang and Dr. Yanbin Hao.

Citing

If you find TokShift-xfmr is useful in your research, please use the following BibTeX entry for citation.

@inproceedings{zhang2022leap,
  title={Long-term Leap Attention, short-term Periodic Shift for Video Classification},
  author={Zhang, Hao and Cheng, Lechao and Hao, Yanbin and Ngo, Chong-Wah},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  year={2022}
}

@inproceedings{zhang2021token,
  title={Token Shift Transformer for Video Classification},
  author={Zhang, Hao and Hao, Yanbin and Ngo, Chong-Wah},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={917--925},
  year={2021}
}

Acknowledgement

Thanks for the following Github projects:

VideoNetworks / LAPS-transformer