Model | ImageNet-1k Top-1 Acc (%) |
Params (M) |
GFLOPs | Variants & Weights |
---|---|---|---|---|
MicroNet | 51.4| 59.4| 62.5 |
2| 2| 3 |
7M| 14M| 23M |
M1|M2|M3 |
MobileFormer | 76.7| 77.9| 79.3 |
9| 11| 14 |
214M| 294M| 508M |
214|294|508 |
ResNet* | 71.5| 80.4| 81.5 |
12| 26| 45 |
2| 4| 8 |
18|50|101 |
GFNet | 80.1| 81.5| 82.9 |
15| 32| 54 |
2| 5| 8 |
T|S|B |
HireMLP | 79.7| 82.1| 83.2 |
18| 33| 58 |
2| 4| 8 |
T|S|B |
WaveMLP | 80.9| 82.9| 83.3 |
17| 30| 44 |
2| 5| 8 |
T|S|M |
PVTv2 | 78.7| 82.0| 83.6 |
14| 25| 63 |
2| 4| 10 |
B1|B2|B4 |
ResT | 79.6| 81.6| 83.6 |
14| 30| 52 |
2| 4| 8 |
S|B|L |
UniFormer | ----| 82.9| 83.8 |
--| 22| 50 |
-| 4| 8 |
-|S|B |
PoolFormer | 80.3| 81.4| 82.1 |
21| 31| 56 |
4| 5| 9 |
S24|S36|M36 |
CycleMLP | 81.6| 83.0| 83.2 |
27| 52| 76 |
4| 10| 12 |
B2|B4|B5 |
PatchConvnet | 82.1| 83.2| 83.5 |
25| 48| 99 |
4| 8| 16 |
S60|S120|B60 |
ConvNeXt | 82.1| 83.1| 83.8 |
28| 50| 89 |
5| 9| 15 |
T|S|B |
Shuffle | 82.4| 83.6| 84.0 |
28| 50| 88 |
5| 9| 16 |
T|S|B |
Conformer | 81.3| 83.4| 84.1 |
24| 38| 83 |
5| 11| 23 |
T|S|B |
CSWin | 82.7| 83.6| 84.2 |
23| 35| 78 |
4| 7| 15 |
T|S|B |
Notes: ResNet* is from "ResNet strikes back" paper.
Table Notes
- Only include models trained on ImageNet1k with image size of 224x224.
- Models' weights are from respective official repositories.
- Large mdoels (Parameters > 100M) are not included.
Requirements (click to expand)
- python >= 3.6
- torch >= 1.8.1
- torchvision >= 0.9.1
Other requirements can be installed with pip install -r requirements.txt
.
Show Available Models
$ python tools/show.py
A table with model names and variants will be shown:
Model Names Model Variants
------------- --------------------------------
ResNet ['18', '34', '50', '101', '152']
MicroNet ['M1', 'M2', 'M3']
ConvNeXt ['T', 'S', 'B']
GFNet ['T', 'S', 'B']
PVTv2 ['B1', 'B2', 'B3', 'B4', 'B5']
ResT ['S', 'B', 'L']
Conformer ['T', 'S', 'B']
Shuffle ['T', 'S', 'B']
CSWin ['T', 'S', 'B', 'L']
CycleMLP ['B1', 'B2', 'B3', 'B4', 'B5']
HireMLP ['T', 'S', 'B']
WaveMLP ['T', 'S', 'M']
PoolFormer ['S24', 'S36', 'M36']
PatchConvnet ['S60', 'S120', 'B60']
UniFormer ['S', 'B']
Inference
- Download your desired model's weights from
Model Zoo
table. - Change
MODEL
parameters andTEST
parameters in config file here. And run the the following command.
$ python tools/infer.py --cfg configs/test.yaml
You will see an output similar to this:
File: assests\dog.jpg >>>>> Golden retriever
Training
$ python tools/train.py --cfg configs/train.yaml
Evaluate
$ python tools/val.py --cfg configs/train.yaml
Fine-tune
Fine-tune on CIFAR-10:
$ python tools/finetune.py --cfg configs/finetune.yaml
References
Citations
@misc{li2021micronet,
title={MicroNet: Improving Image Recognition with Extremely Low FLOPs},
author={Yunsheng Li and Yinpeng Chen and Xiyang Dai and Dongdong Chen and Mengchen Liu and Lu Yuan and Zicheng Liu and Lei Zhang and Nuno Vasconcelos},
year={2021},
eprint={2108.05894},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{wightman2021resnet,
title={ResNet strikes back: An improved training procedure in timm},
author={Ross Wightman and Hugo Touvron and Hervé Jégou},
year={2021},
eprint={2110.00476},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{rao2021global,
title={Global Filter Networks for Image Classification},
author={Yongming Rao and Wenliang Zhao and Zheng Zhu and Jiwen Lu and Jie Zhou},
year={2021},
eprint={2107.00645},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{wang2021pvtv2,
title={PVTv2: Improved Baselines with Pyramid Vision Transformer},
author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
year={2021},
eprint={2106.13797},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{zhang2021rest,
title={ResT: An Efficient Transformer for Visual Recognition},
author={Qinglong Zhang and Yubin Yang},
year={2021},
eprint={2105.13677},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{touvron2021augmenting,
title={Augmenting Convolutional networks with attention-based aggregation},
author={Hugo Touvron and Matthieu Cord and Alaaeldin El-Nouby and Piotr Bojanowski and Armand Joulin and Gabriel Synnaeve and Hervé Jégou},
year={2021},
eprint={2112.13692},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{peng2021conformer,
title={Conformer: Local Features Coupling Global Representations for Visual Recognition},
author={Zhiliang Peng and Wei Huang and Shanzhi Gu and Lingxi Xie and Yaowei Wang and Jianbin Jiao and Qixiang Ye},
year={2021},
eprint={2105.03889},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{huang2021shuffle,
title={Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer},
author={Zilong Huang and Youcheng Ben and Guozhong Luo and Pei Cheng and Gang Yu and Bin Fu},
year={2021},
eprint={2106.03650},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{dong2022cswin,
title={CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows},
author={Xiaoyi Dong and Jianmin Bao and Dongdong Chen and Weiming Zhang and Nenghai Yu and Lu Yuan and Dong Chen and Baining Guo},
year={2022},
eprint={2107.00652},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{chen2021cyclemlp,
title={CycleMLP: A MLP-like Architecture for Dense Prediction},
author={Shoufa Chen and Enze Xie and Chongjian Ge and Runjian Chen and Ding Liang and Ping Luo},
year={2021},
eprint={2107.10224},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{guo2021hiremlp,
title={Hire-MLP: Vision MLP via Hierarchical Rearrangement},
author={Jianyuan Guo and Yehui Tang and Kai Han and Xinghao Chen and Han Wu and Chao Xu and Chang Xu and Yunhe Wang},
year={2021},
eprint={2108.13341},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{yu2021metaformer,
title={MetaFormer is Actually What You Need for Vision},
author={Weihao Yu and Mi Luo and Pan Zhou and Chenyang Si and Yichen Zhou and Xinchao Wang and Jiashi Feng and Shuicheng Yan},
year={2021},
eprint={2111.11418},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{guo2021hiremlp,
title={Hire-MLP: Vision MLP via Hierarchical Rearrangement},
author={Jianyuan Guo and Yehui Tang and Kai Han and Xinghao Chen and Han Wu and Chao Xu and Chang Xu and Yunhe Wang},
year={2021},
eprint={2108.13341},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{tang2021image,
title={An Image Patch is a Wave: Phase-Aware Vision MLP},
author={Yehui Tang and Kai Han and Jianyuan Guo and Chang Xu and Yanxi Li and Chao Xu and Yunhe Wang},
year={2021},
eprint={2111.12294},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{liu2022convnet,
title={A ConvNet for the 2020s},
author={Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
year={2022},
eprint={2201.03545},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{li2022uniformer,
title={UniFormer: Unifying Convolution and Self-attention for Visual Recognition},
author={Kunchang Li and Yali Wang and Junhao Zhang and Peng Gao and Guanglu Song and Yu Liu and Hongsheng Li and Yu Qiao},
year={2022},
eprint={2201.09450},
archivePrefix={arXiv},
primaryClass={cs.CV}
}