guoxih / region-based-non-local-network

[Codes of paper]: Region-based Non-local operation for Video Classification

Home Page:https://arxiv.org/abs/2007.09033

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Region-based Non-local operation for Video Classification [arXiv]

Citation

Please [★star] this repo and [cite] the following arXiv paper if you think our RNL is useful for you:

@article{huang2020region,
  title={Region-based Non-local Operation for Video Classification},
  author={Huang, Guoxi and Bors, Adrian G},
  journal={arXiv preprint arXiv:2007.09033},
  year={2020}
}

Prerequisites

Data Preparation

Please refer to TSM repo for the details of data preparation.

Pretrained Models

The accuracy might be a bit different from the paper, as we did some modification to our models. For example, instead of using SE module reported in the paper, we use the Channel-gate module form GCNet to model the channel attention.

method n-frame Kinetics Acc. checkpoint
NL I3D-ResNet50 32 * 10clips 74.9% -
RNL TSM-ResNet50 8 * 10clips 75.6% link
RNL TSM-ResNet50 16 * 10clips 77.2% link
RNL TSM-ResNet50 (16+8) * 10clips 77.4% -

On Kinetics, RNL TSM models achieve better performance than NL I3D model with less computation (shorter video length).

method n-frame Something-V1 Acc. checkpoint
RNL TSM-ResNet50 8 * 2clips 49.5% link
RNL TSM-ResNet50 16 * 2clips 51.0% link
RNL TSM-ResNet50 (8+16) * 2clips 52.7% -
RNL TSM-ResNet101 8 * 2clips 50.8% link
RNL 101 + RNL 50 (8+16) * 2clips 54.1% -

Training

We provided several examples to train RNL network with this repo:

  • To train on Kinetics from ImageNet pretrained models, you can run the script bellow:
python main.py --dataset kinetics  --dense_sample --dist-url 'tcp://localhost:6666' \
--dist-backend 'nccl' --multiprocessing-distributed --available_gpus 0,1,2,3 --world-size 1 \
--rank 0 --gd 20 --shift --shift_div=8 --shift_place=blockres --npb --lr 0.02 --wd 2e-4 \
--dropout 0.5 --num_segments 8 --batch_size 16 --batch_multiplier 4 --use_warmup --warmup_epochs 5 \
--lr_type cos --epochs 100 --non_local  --suffix 1
  • To train on Something-Something V1 from ImageNet pretrained models, you can run the script bellow:
python main.py --dist-url 'tcp://localhost:6666' --dist-backend 'nccl' \
--multiprocessing-distributed --available_gpus 0,1,2,3 --world-size 1 --rank 0 \
--dataset something --gd 20 --shift --shift_div=8 --shift_place=blockres --npb \
--lr 0.02 --wd 1e-3 --dropout 0.8 --num_segments 8 --batch_size 16 --batch_multiplier 4\
--use_warmup --warmup_epochs 1 --lr_type cos --epochs 50 --non_local  --suffix 1

# Notice that the total batch size is equal to batch_size x batch_multiplier x world_size, and 
# you should scale up the learning rate with batch size. For example, if you use 
# a batch size of 128 you should set learning rate to 0.04.

Test

For example, to test the downloaded pretrained models, you can run the scripts below. The scripts test RNL on 8-frame setting by running:

# test on kinetics
python test_models.py kinetics  \
--weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_cos_dense_nl_lr0.02_wd2.0e-04.pth.tar \
--test_segments=8 --batch_size=16 -j 25 --test_crops=3  --dense_sample --full_res

# test on Something
python test_models.py something \
--weights=pretrained/TSM_something_RGB_resnet50_shift8_blockres_avg_segment8_e50_cos_nl_h_8e-4.pth.tar \
--test_segments=8 --batch_size=2 -j 25 --test_crops=3  --twice_sample  --full_res

Other Info

References

This repository is built upon the following baseline implementations.

Contact

For any questions, please feel free to open an issue or contact:

Guoxi Huang: gh825@york.ac.uk

About

[Codes of paper]: Region-based Non-local operation for Video Classification

https://arxiv.org/abs/2007.09033

License:MIT License


Languages

Language:Python 100.0%