loomlike / ig65m-pytorch

PyTorch 3D video classification models pre-trained on 65 million Instagram videos

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IG-65M PyTorch

Unofficial PyTorch (and ONNX) 3D video classification models and weights pre-trained on IG-65M (65MM Instagram videos). The official Facebook Research Caffe2 model and weights are available here.

PyTorch and ONNX Models πŸ†

We provide converted .pth and .pb PyTorch and ONNX weights, respectively.

Model Pretrain+Finetune Input Size pth onnx caffe2
R(2+1)D_34 IG-65M + None 8x112x112 r2plus1d_34_clip8_ig65m_from_scratch_9bae36ae.pth r2plus1d_34_clip8_ig65m_from_scratch_748ab053.pb r2plus1d_34_clip8_ig65m_from_scratch.pkl
R(2+1)D_34 IG-65M + Kinetics 8x112x112 r2plus1d_34_clip8_ft_kinetics_from_ig65m_0aa0550b.pth r2plus1d_34_clip8_ft_kinetics_from_ig65m_625d61b3.pb r2plus1d_34_clip8_ft_kinetics_from_ig65m.pkl
R(2+1)D_34 IG-65M + None 32x112x112 r2plus1d_34_clip32_ig65m_from_scratch_449a7af9.pth r2plus1d_34_clip32_ig65m_from_scratch_e304d648.pb r2plus1d_34_clip32_ig65m_from_scratch.pkl
R(2+1)D_34 IG-65M + Kinetics 32x112x112 r2plus1d_34_clip32_ft_kinetics_from_ig65m_ade133f1.pth r2plus1d_34_clip32_ft_kinetics_from_ig65m_10f4c3bf.pb r2plus1d_34_clip32_ft_kinetics_from_ig65m.pkl

Notes

  • ONNX models provided here have not been optimized for inference.
  • Models fine-tuned on Kinetics have 400 classes, the plain IG65 models 487 (32 clips), and 359 (8 clips) classes.
  • For models fine-tuned on Kinetics you can use the labels from here.
  • For plain IG65 models there is no label map available.

Usage πŸ’»

The following describes how to use the model in your own project and how to use our conversion and extraction tools.

In Your Own Project

  • See convert.py and copy the r2plus1d_34 model architecture definition
  • See exract.py for how to load the corresponding weights into the model

Note: we require torchvision v0.4 or later for the model architecture building blocks

Development and Tools

We provide CPU and nvidia-docker based GPU Dockerfiles for self-contained and reproducible environments.

Use the convenience Makefile to build the Docker image and then get into the container mounting a host directory to /data inside the container:

make
make run datadir=/Path/To/My/Videos

By default we build and run the CPU Docker images; for GPUs run:

make dockerfile=Dockerfile.gpu
make gpu

The WebcamDataset requires exposing /dev/video0 to the container which will only work on Linux:

make
make webcam

Convert Weights 🍝

Build the docker image and get into the container as described above. Then see the convert.py tool's --help and its source.

Extract Features πŸͺ

Build the docker image and get into the container as described above. Then see the extract.py tool's --help and its source.

References πŸ“–

  1. D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun and M. Paluri. A Closer Look at Spatiotemporal Convolutions for Action Recognition. CVPR 2018.
  2. D. Tran, H. Wang, L. Torresani and M. Feiszli. Video Classification with Channel-Separated Convolutional Networks. ICCV 2019.
  3. D. Ghadiyaram, M. Feiszli, D. Tran, X. Yan, H. Wang and D. Mahajan, Large-scale weakly-supervised pre-training for video action recognition. CVPR 2019.
  4. VMZ: Model Zoo for Video Modeling
  5. Kinetics & IG-65M

License

Copyright Β© 2019 MoabitCoin

Distributed under the MIT License (MIT).

About

PyTorch 3D video classification models pre-trained on 65 million Instagram videos

License:MIT License


Languages

Language:Python 91.4%Language:Makefile 8.6%