StevenGrove / vtpack

code base for vision transformers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VTPACK

This repo is an official implementation for "Dynamic Grained Encoder for Vision Transformers" (NeurIPS2021) on PyTorch framework.

Installation

Requirements

  • Python >= 3.6
  • PyTorch >= 1.8 and torchvision
  • timm:
    • pip install timm
  • GCC >= 4.9

Build from source

  • git clone https://github.com/StevenGrove/vtpack
  • cd vtpack
  • python setup.py build develop

Prepare data

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Usage

Training

# Running training procedure with specific GPU number
./tools/run_dist_launch.sh <GPU_NUM> <path_to_config> [optional arguments]

# Please refer to main.py for more optional arguments

Inference

# Running inference procedure with specific GPU number and model path
./tools/run_dist_launch.sh <GPU_NUM> <path_to_config> --eval --resume <model_path> [optional arguments]

# Please refer to main.py for more optional arguments

Image Classification on ImageNet val set

The following models are trained and evaluated with 256 * 256 input images. The budget for DGE is 0.5.

Method Acc1 Acc5 (%) MACavg Project Model
DeiT-Ti 73.2 91.8 1.7G Link GoogleDrive
DeiT-Ti + DGE 73.2 91.7 1.1G Link GoogleDrive
DeiT-S 80.6 95.4 6.1G Link GoogleDrive
DeiT-S + DGE 80.1 95.0 3.5G Link GoogleDrive

More models are comming soon.

Citation

Please cite the paper in your publications if it helps your research.

@inproceedings{song2021dynamic,
    title={Dynamic Grained Encoder for Vision Transformers},
    author={Song, Lin and Zhang, Songyang and Liu, Songtao and Li, Zeming and He, Xuming and Sun, Hongbin and Sun, Jian and Zheng, Nanning},
    booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
    year={2021}
}

Please cite this project in your publications if it helps your research.

@misc{vtpack,
    author = {Song, Lin},
    title = {VTPACK},
    howpublished = {\url{https://github.com/StevenGrove/vtpack}},
    year ={2021}
}

About

code base for vision transformers

License:Apache License 2.0


Languages

Language:Python 68.8%Language:Cuda 27.4%Language:C++ 1.7%Language:Shell 1.7%Language:C 0.4%