efficient-deep-learning self-supervised-learning vision-transformer

Efficient Training of Visual Transformers with Small Datasets

To appear in NerIPS 2021.

[paper][Poster & Video][arXiv][code] [reviews]
Yahui Liu^1,3, Enver Sangineto¹, Wei Bi², Nicu Sebe¹, Bruno Lepri³, Marco De Nadai³
¹University of Trento, Italy, ²Tencent AI Lab, China, ³Bruno Kessler Foundation, Italy.

Data preparation

Dataset	Download Link
ImageNet	train,val
CIFAR-10	all
CIFAR-100	all
SVHN	train,test, extra
Oxford-Flower102	images, labels, splits
Clipart	images, train_list, test_list
Infograph	images, train_list, test_list
Painting	images, train_list, test_list
Quickdraw	images, train_list, test_list
Real	images, train_list, test_list
Sketch	images, train_list, test_list

Download the datasets and pre-processe some of them (i.e., imagenet, domainnet) by using codes in the scripts folder.
The datasets are prepared with the following stucture (except CIFAR-10/100 and SVHN):

dataset_name
  |__train
  |    |__category1
  |    |    |__xxx.jpg
  |    |    |__...
  |    |__category2
  |    |    |__xxx.jpg
  |    |    |__...
  |    |__...
  |__val
       |__category1
       |    |__xxx.jpg
       |    |__...
       |__category2
       |    |__xxx.jpg
       |    |__...
       |__...

Training

After prepare the datasets, we can simply start the training with 8 NVIDIA V100 GPUs:

sh train.sh

Evaluation

We can also load the pre-trained model and test the performance:

sh eval.sh

Pretrained models

For fast evaluation, we present the results of Swin-T trained with 100 epochs on various datasets as an example (Note that we save the model every 5 epochs during the training, so the attached best models may be slight different from the reported performances).

Datasets	Baseline	Ours
CIFAR-10	59.47	83.89
CIFAR-100	53.28	66.23
SVHN	71.60	94.23
Flowers102	34.51	39.37
Clipart	38.05	47.47
Infograph	8.20	10.16
Painting	35.92	41.86
Quickdraw	24.08	69.41
Real	73.47	75.59
Sketch	11.97	38.55

We provide a demo to download the pretrained models from Google Drive directly:

python3 ./scripts/collect_models.py

Related Work:

Acknowledgments

This code is highly based on the Swin-Transformer. Thanks to the contributors of this project.

Citation

@InProceedings{liu2021efficient,
    author    = {Liu, Yahui and Sangineto, Enver and Bi, Wei and Sebe, Nicu and Lepri, Bruno and De Nadai, Marco},
    title     = {Efficient Training of Visual Transformers with Small Datasets},
    booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
    year      = {2021}
}

If you have any questions, please contact me without hesitation (yahui.cvrs AT gmail.com).

About

NeurIPS 2021, Official codes for "Efficient Training of Visual Transformers with Small Datasets".

efficient-deep-learning self-supervised-learning vision-transformer

MIT License

Languages

Language:Python 98.6%Language:Shell 1.4%