action-recognition video-classification video-understanding non-local rnl pretrained-models kinetics

Region-based Non-local operation for Video Classification [arXiv]

Citation

Please [★star] this repo and [cite] the following arXiv paper if you think our RNL is useful for you:

@article{huang2020region,
  title={Region-based Non-local Operation for Video Classification},
  author={Huang, Guoxi and Bors, Adrian G},
  journal={arXiv preprint arXiv:2007.09033},
  year={2020}
}

Prerequisites

PyTorch 1.3 or higher

Data Preparation

Please refer to TSM repo for the details of data preparation.

Pretrained Models

The accuracy might be a bit different from the paper, as we did some modification to our models. For example, instead of using SE module reported in the paper, we use the Channel-gate module form GCNet to model the channel attention.

method	n-frame	Kinetics Acc.	checkpoint
NL I3D-ResNet50	32 * 10clips	74.9%	-
RNL TSM-ResNet50	8 * 10clips	75.6%	link
RNL TSM-ResNet50	16 * 10clips	77.2%	link
RNL TSM-ResNet50	(16+8) * 10clips	77.4%	-

On Kinetics, RNL TSM models achieve better performance than NL I3D model with less computation (shorter video length).

method	n-frame	Something-V1 Acc.	checkpoint
RNL TSM-ResNet50	8 * 2clips	49.5%	link
RNL TSM-ResNet50	16 * 2clips	51.0%	link
RNL TSM-ResNet50	(8+16) * 2clips	52.7%	-
RNL TSM-ResNet101	8 * 2clips	50.8%	link
RNL 101 + RNL 50	(8+16) * 2clips	54.1%	-

Training

We provided several examples to train RNL network with this repo:

To train on Kinetics from ImageNet pretrained models, you can run the script bellow:

python main.py --dataset kinetics  --dense_sample --dist-url 'tcp://localhost:6666' \
--dist-backend 'nccl' --multiprocessing-distributed --available_gpus 0,1,2,3 --world-size 1 \
--rank 0 --gd 20 --shift --shift_div=8 --shift_place=blockres --npb --lr 0.02 --wd 2e-4 \
--dropout 0.5 --num_segments 8 --batch_size 16 --batch_multiplier 4 --use_warmup --warmup_epochs 5 \
--lr_type cos --epochs 100 --non_local  --suffix 1

To train on Something-Something V1 from ImageNet pretrained models, you can run the script bellow:

python main.py --dist-url 'tcp://localhost:6666' --dist-backend 'nccl' \
--multiprocessing-distributed --available_gpus 0,1,2,3 --world-size 1 --rank 0 \
--dataset something --gd 20 --shift --shift_div=8 --shift_place=blockres --npb \
--lr 0.02 --wd 1e-3 --dropout 0.8 --num_segments 8 --batch_size 16 --batch_multiplier 4\
--use_warmup --warmup_epochs 1 --lr_type cos --epochs 50 --non_local  --suffix 1

# Notice that the total batch size is equal to batch_size x batch_multiplier x world_size, and 
# you should scale up the learning rate with batch size. For example, if you use 
# a batch size of 128 you should set learning rate to 0.04.

Test

For example, to test the downloaded pretrained models, you can run the scripts below. The scripts test RNL on 8-frame setting by running:

# test on kinetics
python test_models.py kinetics  \
--weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_cos_dense_nl_lr0.02_wd2.0e-04.pth.tar \
--test_segments=8 --batch_size=16 -j 25 --test_crops=3  --dense_sample --full_res

# test on Something
python test_models.py something \
--weights=pretrained/TSM_something_RGB_resnet50_shift8_blockres_avg_segment8_e50_cos_nl_h_8e-4.pth.tar \
--test_segments=8 --batch_size=2 -j 25 --test_crops=3  --twice_sample  --full_res

Other Info

References

This repository is built upon the following baseline implementations.

TSM
TSN

Contact

For any questions, please feel free to open an issue or contact:

Guoxi Huang: gh825@york.ac.uk

About

[Codes of paper]: Region-based Non-local operation for Video Classification

https://arxiv.org/abs/2007.09033

action-recognition video-classification video-understanding non-local rnl pretrained-models kinetics

MIT License

Languages

Language:Python 100.0%