SOTA Image Classification Models in PyTorch

Model Zoo

Model	ImageNet-1k Top-1 Acc ^(%)	Params ^(M)	GFLOPs	Variants & Weights
MicroNet	51.4`\|`59.4`\|`62.5	2`\|`2`\|`3	7M`\|`14M`\|`23M	M1\|M2\|M3
MobileFormer	76.7`\|`77.9`\|`79.3	9`\|`11`\|`14	214M`\|`294M`\|`508M	214\|294\|508

ResNet*	71.5`\|`80.4`\|`81.5	12`\|`26`\|`45	2`\|`4`\|`8	18\|50\|101
GFNet	80.1`\|`81.5`\|`82.9	15`\|`32`\|`54	2`\|`5`\|`8	T\|S\|B
HireMLP	79.7`\|`82.1`\|`83.2	18`\|`33`\|`58	2`\|`4`\|`8	T\|S\|B
WaveMLP	80.9`\|`82.9`\|`83.3	17`\|`30`\|`44	2`\|`5`\|`8	T\|S\|M
PVTv2	78.7`\|`82.0`\|`83.6	14`\|`25`\|`63	2`\|`4`\|`10	B1\|B2\|B4
ResT	79.6`\|`81.6`\|`83.6	14`\|`30`\|`52	2`\|`4`\|`8	S\|B\|L
UniFormer	----`\|`82.9`\|`83.8	--`\|`22`\|`50	-`\|`4`\|`8	-\|S\|B

PoolFormer	80.3`\|`81.4`\|`82.1	21`\|`31`\|`56	4`\|`5`\|`9	S24\|S36\|M36
CycleMLP	81.6`\|`83.0`\|`83.2	27`\|`52`\|`76	4`\|`10`\|`12	B2\|B4\|B5
PatchConvnet	82.1`\|`83.2`\|`83.5	25`\|`48`\|`99	4`\|`8`\|`16	S60\|S120\|B60
ConvNeXt	82.1`\|`83.1`\|`83.8	28`\|`50`\|`89	5`\|`9`\|`15	T\|S\|B
Shuffle	82.4`\|`83.6`\|`84.0	28`\|`50`\|`88	5`\|`9`\|`16	T\|S\|B
Conformer	81.3`\|`83.4`\|`84.1	24`\|`38`\|`83	5`\|`11`\|`23	T\|S\|B
CSWin	82.7`\|`83.6`\|`84.2	23`\|`35`\|`78	4`\|`7`\|`15	T\|S\|B

Notes: ResNet* is from "ResNet strikes back" paper.

Table Notes

Only include models trained on ImageNet1k with image size of 224x224.
Models' weights are from respective official repositories.
Large mdoels (Parameters > 100M) are not included.

Usage

Requirements (click to expand)

python >= 3.6
torch >= 1.8.1
torchvision >= 0.9.1

Other requirements can be installed with pip install -r requirements.txt.

Show Available Models

$ python tools/show.py

A table with model names and variants will be shown:

Model Names    Model Variants
-------------  --------------------------------
ResNet         ['18', '34', '50', '101', '152']
MicroNet       ['M1', 'M2', 'M3']
ConvNeXt       ['T', 'S', 'B']
GFNet          ['T', 'S', 'B']
PVTv2          ['B1', 'B2', 'B3', 'B4', 'B5']
ResT           ['S', 'B', 'L']
Conformer      ['T', 'S', 'B']
Shuffle        ['T', 'S', 'B']
CSWin          ['T', 'S', 'B', 'L']
CycleMLP       ['B1', 'B2', 'B3', 'B4', 'B5']
HireMLP        ['T', 'S', 'B']
WaveMLP        ['T', 'S', 'M']
PoolFormer     ['S24', 'S36', 'M36']
PatchConvnet   ['S60', 'S120', 'B60']
UniFormer      ['S', 'B']

Inference

Download your desired model's weights from Model Zoo table.
Change MODEL parameters and TEST parameters in config file here. And run the the following command.

$ python tools/infer.py --cfg configs/test.yaml

You will see an output similar to this:

File: assests\dog.jpg >>>>> Golden retriever

Training

$ python tools/train.py --cfg configs/train.yaml

Evaluate

$ python tools/val.py --cfg configs/train.yaml

Fine-tune

Fine-tune on CIFAR-10:

$ python tools/finetune.py --cfg configs/finetune.yaml

References

Citations

@misc{li2021micronet,
  title={MicroNet: Improving Image Recognition with Extremely Low FLOPs}, 
  author={Yunsheng Li and Yinpeng Chen and Xiyang Dai and Dongdong Chen and Mengchen Liu and Lu Yuan and Zicheng Liu and Lei Zhang and Nuno Vasconcelos},
  year={2021},
  eprint={2108.05894},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wightman2021resnet,
  title={ResNet strikes back: An improved training procedure in timm}, 
  author={Ross Wightman and Hugo Touvron and Hervé Jégou},
  year={2021},
  eprint={2110.00476},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{rao2021global,
  title={Global Filter Networks for Image Classification}, 
  author={Yongming Rao and Wenliang Zhao and Zheng Zhu and Jiwen Lu and Jie Zhou},
  year={2021},
  eprint={2107.00645},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
  author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
  year={2021},
  eprint={2106.13797},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{zhang2021rest,
  title={ResT: An Efficient Transformer for Visual Recognition}, 
  author={Qinglong Zhang and Yubin Yang},
  year={2021},
  eprint={2105.13677},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{touvron2021augmenting,
  title={Augmenting Convolutional networks with attention-based aggregation}, 
  author={Hugo Touvron and Matthieu Cord and Alaaeldin El-Nouby and Piotr Bojanowski and Armand Joulin and Gabriel Synnaeve and Hervé Jégou},
  year={2021},
  eprint={2112.13692},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{peng2021conformer,
  title={Conformer: Local Features Coupling Global Representations for Visual Recognition}, 
  author={Zhiliang Peng and Wei Huang and Shanzhi Gu and Lingxi Xie and Yaowei Wang and Jianbin Jiao and Qixiang Ye},
  year={2021},
  eprint={2105.03889},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{huang2021shuffle,
  title={Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer}, 
  author={Zilong Huang and Youcheng Ben and Guozhong Luo and Pei Cheng and Gang Yu and Bin Fu},
  year={2021},
  eprint={2106.03650},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{dong2022cswin,
  title={CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows}, 
  author={Xiaoyi Dong and Jianmin Bao and Dongdong Chen and Weiming Zhang and Nenghai Yu and Lu Yuan and Dong Chen and Baining Guo},
  year={2022},
  eprint={2107.00652},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{chen2021cyclemlp,
  title={CycleMLP: A MLP-like Architecture for Dense Prediction}, 
  author={Shoufa Chen and Enze Xie and Chongjian Ge and Runjian Chen and Ding Liang and Ping Luo},
  year={2021},
  eprint={2107.10224},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{guo2021hiremlp,
  title={Hire-MLP: Vision MLP via Hierarchical Rearrangement}, 
  author={Jianyuan Guo and Yehui Tang and Kai Han and Xinghao Chen and Han Wu and Chao Xu and Chang Xu and Yunhe Wang},
  year={2021},
  eprint={2108.13341},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{yu2021metaformer,
  title={MetaFormer is Actually What You Need for Vision}, 
  author={Weihao Yu and Mi Luo and Pan Zhou and Chenyang Si and Yichen Zhou and Xinchao Wang and Jiashi Feng and Shuicheng Yan},
  year={2021},
  eprint={2111.11418},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{guo2021hiremlp,
  title={Hire-MLP: Vision MLP via Hierarchical Rearrangement}, 
  author={Jianyuan Guo and Yehui Tang and Kai Han and Xinghao Chen and Han Wu and Chao Xu and Chang Xu and Yunhe Wang},
  year={2021},
  eprint={2108.13341},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{tang2021image,
  title={An Image Patch is a Wave: Phase-Aware Vision MLP}, 
  author={Yehui Tang and Kai Han and Jianyuan Guo and Chang Xu and Yanxi Li and Chao Xu and Yunhe Wang},
  year={2021},
  eprint={2111.12294},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{liu2022convnet,
  title={A ConvNet for the 2020s}, 
  author={Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
  year={2022},
  eprint={2201.03545},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{li2022uniformer,
  title={UniFormer: Unifying Convolution and Self-attention for Visual Recognition}, 
  author={Kunchang Li and Yali Wang and Junhao Zhang and Peng Gao and Guanglu Song and Yu Liu and Hongsheng Li and Yu Qiao},
  year={2022},
  eprint={2201.09450},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Model	ImageNet-1k Top-1 Acc ^(%)	Params ^(M)	GFLOPs	Variants & Weights
MicroNet	51.4`\|`59.4`\|`62.5	2`\|`2`\|`3	7M`\|`14M`\|`23M	M1\|M2\|M3
MobileFormer	76.7`\|`77.9`\|`79.3	9`\|`11`\|`14	214M`\|`294M`\|`508M	214\|294\|508

ResNet*	71.5`\|`80.4`\|`81.5	12`\|`26`\|`45	2`\|`4`\|`8	18\|50\|101
GFNet	80.1`\|`81.5`\|`82.9	15`\|`32`\|`54	2`\|`5`\|`8	T\|S\|B
HireMLP	79.7`\|`82.1`\|`83.2	18`\|`33`\|`58	2`\|`4`\|`8	T\|S\|B
WaveMLP	80.9`\|`82.9`\|`83.3	17`\|`30`\|`44	2`\|`5`\|`8	T\|S\|M
PVTv2	78.7`\|`82.0`\|`83.6	14`\|`25`\|`63	2`\|`4`\|`10	B1\|B2\|B4
ResT	79.6`\|`81.6`\|`83.6	14`\|`30`\|`52	2`\|`4`\|`8	S\|B\|L
UniFormer	----`\|`82.9`\|`83.8	--`\|`22`\|`50	-`\|`4`\|`8	-\|S\|B

PoolFormer	80.3`\|`81.4`\|`82.1	21`\|`31`\|`56	4`\|`5`\|`9	S24\|S36\|M36
CycleMLP	81.6`\|`83.0`\|`83.2	27`\|`52`\|`76	4`\|`10`\|`12	B2\|B4\|B5
PatchConvnet	82.1`\|`83.2`\|`83.5	25`\|`48`\|`99	4`\|`8`\|`16	S60\|S120\|B60
ConvNeXt	82.1`\|`83.1`\|`83.8	28`\|`50`\|`89	5`\|`9`\|`15	T\|S\|B
Shuffle	82.4`\|`83.6`\|`84.0	28`\|`50`\|`88	5`\|`9`\|`16	T\|S\|B
Conformer	81.3`\|`83.4`\|`84.1	24`\|`38`\|`83	5`\|`11`\|`23	T\|S\|B
CSWin	82.7`\|`83.6`\|`84.2	23`\|`35`\|`78	4`\|`7`\|`15	T\|S\|B

jimilee / image-classification

SOTA Image Classification Models in PyTorch

Model Zoo

Usage

About

Languages