VTPACK

This repo is an official implementation for "Dynamic Grained Encoder for Vision Transformers" (NeurIPS2021) on PyTorch framework.

Installation

Requirements

Python >= 3.6
PyTorch >= 1.8 and torchvision
timm:
- pip install timm
GCC >= 4.9

Build from source

git clone https://github.com/StevenGrove/vtpack
cd vtpack
python setup.py build develop

Prepare data

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Usage

Training

# Running training procedure with specific GPU number
./tools/run_dist_launch.sh <GPU_NUM> <path_to_config> [optional arguments]

# Please refer to main.py for more optional arguments

Inference

# Running inference procedure with specific GPU number and model path
./tools/run_dist_launch.sh <GPU_NUM> <path_to_config> --eval --resume <model_path> [optional arguments]

# Please refer to main.py for more optional arguments

Image Classification on ImageNet val set

The following models are trained and evaluated with 256 * 256 input images. The budget for DGE is 0.5.

Method	Acc1	Acc5 (%)	MAC_avg	Project	Model
DeiT-Ti	73.2	91.8	1.7G	Link	GoogleDrive
DeiT-Ti + DGE	73.2	91.7	1.1G	Link	GoogleDrive
DeiT-S	80.6	95.4	6.1G	Link	GoogleDrive
DeiT-S + DGE	80.1	95.0	3.5G	Link	GoogleDrive

More models are comming soon.

Citation

Please cite the paper in your publications if it helps your research.

@inproceedings{song2021dynamic,
    title={Dynamic Grained Encoder for Vision Transformers},
    author={Song, Lin and Zhang, Songyang and Liu, Songtao and Li, Zeming and He, Xuming and Sun, Hongbin and Sun, Jian and Zheng, Nanning},
    booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
    year={2021}
}

Please cite this project in your publications if it helps your research.

@misc{vtpack,
    author = {Song, Lin},
    title = {VTPACK},
    howpublished = {\url{https://github.com/StevenGrove/vtpack}},
    year ={2021}
}

About

code base for vision transformers

Apache License 2.0

Languages

Language:Python 68.8%Language:Cuda 27.4%Language:C++ 1.7%Language:Shell 1.7%Language:C 0.4%