IG-65M PyTorch

Unofficial PyTorch (and ONNX) 3D video classification models and weights pre-trained on IG-65M (65MM Instagram videos). The official Facebook Research Caffe2 model and weights are available here.

PyTorch and ONNX Models 🏆

We provide converted .pth and .pb PyTorch and ONNX weights, respectively.

Model	Pretrain+Finetune	Input Size	pth	onnx	caffe2
R(2+1)D_34	IG-65M + None	8x112x112	r2plus1d_34_clip8_ig65m_from_scratch_9bae36ae.pth	r2plus1d_34_clip8_ig65m_from_scratch_748ab053.pb	r2plus1d_34_clip8_ig65m_from_scratch.pkl
R(2+1)D_34	IG-65M + Kinetics	8x112x112	r2plus1d_34_clip8_ft_kinetics_from_ig65m_0aa0550b.pth	r2plus1d_34_clip8_ft_kinetics_from_ig65m_625d61b3.pb	r2plus1d_34_clip8_ft_kinetics_from_ig65m.pkl
R(2+1)D_34	IG-65M + None	32x112x112	r2plus1d_34_clip32_ig65m_from_scratch_449a7af9.pth	r2plus1d_34_clip32_ig65m_from_scratch_e304d648.pb	r2plus1d_34_clip32_ig65m_from_scratch.pkl
R(2+1)D_34	IG-65M + Kinetics	32x112x112	r2plus1d_34_clip32_ft_kinetics_from_ig65m_ade133f1.pth	r2plus1d_34_clip32_ft_kinetics_from_ig65m_10f4c3bf.pb	r2plus1d_34_clip32_ft_kinetics_from_ig65m.pkl

Notes

ONNX models provided here have not been optimized for inference.
Models fine-tuned on Kinetics have 400 classes, the plain IG65 models 487 (32 clips), and 359 (8 clips) classes.
For models fine-tuned on Kinetics you can use the labels from here.
For plain IG65 models there is no label map available.

Usage 💻

The following describes how to use the model in your own project and how to use our conversion and extraction tools.

In Your Own Project

See convert.py and copy the r2plus1d_34 model architecture definition
See exract.py for how to load the corresponding weights into the model

Note: we require torchvision v0.4 or later for the model architecture building blocks

Development and Tools

We provide CPU and nvidia-docker based GPU Dockerfiles for self-contained and reproducible environments.

Use the convenience Makefile to build the Docker image and then get into the container mounting a host directory to /data inside the container:

make
make run datadir=/Path/To/My/Videos

By default we build and run the CPU Docker images; for GPUs run:

make dockerfile=Dockerfile.gpu
make gpu

The WebcamDataset requires exposing /dev/video0 to the container which will only work on Linux:

make
make webcam

Convert Weights 🍝

Build the docker image and get into the container as described above. Then see the convert.py tool's --help and its source.

Extract Features 🍪

Build the docker image and get into the container as described above. Then see the extract.py tool's --help and its source.

References 📖

D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun and M. Paluri. A Closer Look at Spatiotemporal Convolutions for Action Recognition. CVPR 2018.
D. Tran, H. Wang, L. Torresani and M. Feiszli. Video Classification with Channel-Separated Convolutional Networks. ICCV 2019.
D. Ghadiyaram, M. Feiszli, D. Tran, X. Yan, H. Wang and D. Mahajan, Large-scale weakly-supervised pre-training for video action recognition. CVPR 2019.
VMZ: Model Zoo for Video Modeling
Kinetics & IG-65M

License

Distributed under the MIT License (MIT).

loomlike / ig65m-pytorch