efficient-transformers efficient-vision-transformers fast-inference sparse-attention attention-mechanism low-rank sparse-neural-networks vision-transformer

Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

This repository contains PyTorch implementation for Sparsifiner (CVPR 2023).

[Project Page] [arXiv (CVPR 2023)]

Usage

Requirements

torch>=1.8.1
torchvision>=0.9.1
timm==0.3.2
tensorboardX
six
fvcore

Data preparation: download and extract ImageNet images from http://image-net.org/. The directory structure should be

│ILSVRC2012/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Model preparation: download pre-trained models if necessary:

model	url	model	url
DeiT-Small	link	LVViT-S	link
DeiT-Base	link	LVViT-M	link

Training

To train a Sparsifiner model with default configuration on ImageNet, run:

Sparsifiner-S

Train on 8 GPUs

bash run_model.sh --IMNET sparsifiner_default 8

License

MIT License

Acknowledgements

Our code is based on DynamicVit, pytorch-image-models, DeiT, LV-ViT

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Wei_2023_CVPR,
    author    = {Wei, Cong and Duke, Brendan and Jiang, Ruowei and Aarabi, Parham and Taylor, Graham W. and Shkurti, Florian},
    title     = {Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {22680-22689}
}

About

Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"

https://lim142857.github.io/lim142857.github.io-sparsifiner/

efficient-transformers efficient-vision-transformers fast-inference sparse-attention attention-mechanism low-rank sparse-neural-networks vision-transformer

MIT License

Languages

Language:Python 98.7%Language:Shell 1.3%