jimilee / image-classification

A collection of SOTA Image Classification Models in PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SOTA Image Classification Models in PyTorch

Model Zoo

Model ImageNet-1k Top-1 Acc
(%)
Params
(M)
GFLOPs Variants & Weights
MicroNet 51.4|59.4|62.5 2|2|3 7M|14M|23M M1|M2|M3
MobileFormer 76.7|77.9|79.3 9|11|14 214M|294M|508M 214|294|508
ResNet* 71.5|80.4|81.5 12|26|45 2|4|8 18|50|101
GFNet 80.1|81.5|82.9 15|32|54 2|5|8 T|S|B
HireMLP 79.7|82.1|83.2 18|33|58 2|4|8 T|S|B
WaveMLP 80.9|82.9|83.3 17|30|44 2|5|8 T|S|M
PVTv2 78.7|82.0|83.6 14|25|63 2|4|10 B1|B2|B4
ResT 79.6|81.6|83.6 14|30|52 2|4|8 S|B|L
UniFormer ----|82.9|83.8 --|22|50 -|4|8 -|S|B
PoolFormer 80.3|81.4|82.1 21|31|56 4|5|9 S24|S36|M36
CycleMLP 81.6|83.0|83.2 27|52|76 4|10|12 B2|B4|B5
PatchConvnet 82.1|83.2|83.5 25|48|99 4|8|16 S60|S120|B60
ConvNeXt 82.1|83.1|83.8 28|50|89 5|9|15 T|S|B
Shuffle 82.4|83.6|84.0 28|50|88 5|9|16 T|S|B
Conformer 81.3|83.4|84.1 24|38|83 5|11|23 T|S|B
CSWin 82.7|83.6|84.2 23|35|78 4|7|15 T|S|B

Notes: ResNet* is from "ResNet strikes back" paper.

Table Notes
  • Only include models trained on ImageNet1k with image size of 224x224.
  • Models' weights are from respective official repositories.
  • Large mdoels (Parameters > 100M) are not included.

Usage

Requirements (click to expand)
  • python >= 3.6
  • torch >= 1.8.1
  • torchvision >= 0.9.1

Other requirements can be installed with pip install -r requirements.txt.


Show Available Models
$ python tools/show.py

A table with model names and variants will be shown:

Model Names    Model Variants
-------------  --------------------------------
ResNet         ['18', '34', '50', '101', '152']
MicroNet       ['M1', 'M2', 'M3']
ConvNeXt       ['T', 'S', 'B']
GFNet          ['T', 'S', 'B']
PVTv2          ['B1', 'B2', 'B3', 'B4', 'B5']
ResT           ['S', 'B', 'L']
Conformer      ['T', 'S', 'B']
Shuffle        ['T', 'S', 'B']
CSWin          ['T', 'S', 'B', 'L']
CycleMLP       ['B1', 'B2', 'B3', 'B4', 'B5']
HireMLP        ['T', 'S', 'B']
WaveMLP        ['T', 'S', 'M']
PoolFormer     ['S24', 'S36', 'M36']
PatchConvnet   ['S60', 'S120', 'B60']
UniFormer      ['S', 'B']
Inference
  • Download your desired model's weights from Model Zoo table.
  • Change MODEL parameters and TEST parameters in config file here. And run the the following command.
$ python tools/infer.py --cfg configs/test.yaml

You will see an output similar to this:

File: assests\dog.jpg >>>>> Golden retriever

Training
$ python tools/train.py --cfg configs/train.yaml

Evaluate
$ python tools/val.py --cfg configs/train.yaml

Fine-tune

Fine-tune on CIFAR-10:

$ python tools/finetune.py --cfg configs/finetune.yaml

References

Citations
@misc{li2021micronet,
  title={MicroNet: Improving Image Recognition with Extremely Low FLOPs}, 
  author={Yunsheng Li and Yinpeng Chen and Xiyang Dai and Dongdong Chen and Mengchen Liu and Lu Yuan and Zicheng Liu and Lei Zhang and Nuno Vasconcelos},
  year={2021},
  eprint={2108.05894},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wightman2021resnet,
  title={ResNet strikes back: An improved training procedure in timm}, 
  author={Ross Wightman and Hugo Touvron and Hervé Jégou},
  year={2021},
  eprint={2110.00476},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{rao2021global,
  title={Global Filter Networks for Image Classification}, 
  author={Yongming Rao and Wenliang Zhao and Zheng Zhu and Jiwen Lu and Jie Zhou},
  year={2021},
  eprint={2107.00645},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
  author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
  year={2021},
  eprint={2106.13797},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{zhang2021rest,
  title={ResT: An Efficient Transformer for Visual Recognition}, 
  author={Qinglong Zhang and Yubin Yang},
  year={2021},
  eprint={2105.13677},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{touvron2021augmenting,
  title={Augmenting Convolutional networks with attention-based aggregation}, 
  author={Hugo Touvron and Matthieu Cord and Alaaeldin El-Nouby and Piotr Bojanowski and Armand Joulin and Gabriel Synnaeve and Hervé Jégou},
  year={2021},
  eprint={2112.13692},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{peng2021conformer,
  title={Conformer: Local Features Coupling Global Representations for Visual Recognition}, 
  author={Zhiliang Peng and Wei Huang and Shanzhi Gu and Lingxi Xie and Yaowei Wang and Jianbin Jiao and Qixiang Ye},
  year={2021},
  eprint={2105.03889},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{huang2021shuffle,
  title={Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer}, 
  author={Zilong Huang and Youcheng Ben and Guozhong Luo and Pei Cheng and Gang Yu and Bin Fu},
  year={2021},
  eprint={2106.03650},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{dong2022cswin,
  title={CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows}, 
  author={Xiaoyi Dong and Jianmin Bao and Dongdong Chen and Weiming Zhang and Nenghai Yu and Lu Yuan and Dong Chen and Baining Guo},
  year={2022},
  eprint={2107.00652},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{chen2021cyclemlp,
  title={CycleMLP: A MLP-like Architecture for Dense Prediction}, 
  author={Shoufa Chen and Enze Xie and Chongjian Ge and Runjian Chen and Ding Liang and Ping Luo},
  year={2021},
  eprint={2107.10224},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{guo2021hiremlp,
  title={Hire-MLP: Vision MLP via Hierarchical Rearrangement}, 
  author={Jianyuan Guo and Yehui Tang and Kai Han and Xinghao Chen and Han Wu and Chao Xu and Chang Xu and Yunhe Wang},
  year={2021},
  eprint={2108.13341},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{yu2021metaformer,
  title={MetaFormer is Actually What You Need for Vision}, 
  author={Weihao Yu and Mi Luo and Pan Zhou and Chenyang Si and Yichen Zhou and Xinchao Wang and Jiashi Feng and Shuicheng Yan},
  year={2021},
  eprint={2111.11418},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{guo2021hiremlp,
  title={Hire-MLP: Vision MLP via Hierarchical Rearrangement}, 
  author={Jianyuan Guo and Yehui Tang and Kai Han and Xinghao Chen and Han Wu and Chao Xu and Chang Xu and Yunhe Wang},
  year={2021},
  eprint={2108.13341},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{tang2021image,
  title={An Image Patch is a Wave: Phase-Aware Vision MLP}, 
  author={Yehui Tang and Kai Han and Jianyuan Guo and Chang Xu and Yanxi Li and Chao Xu and Yunhe Wang},
  year={2021},
  eprint={2111.12294},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{liu2022convnet,
  title={A ConvNet for the 2020s}, 
  author={Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
  year={2022},
  eprint={2201.03545},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{li2022uniformer,
  title={UniFormer: Unifying Convolution and Self-attention for Visual Recognition}, 
  author={Kunchang Li and Yali Wang and Junhao Zhang and Peng Gao and Guanglu Song and Yu Liu and Hongsheng Li and Yu Qiao},
  year={2022},
  eprint={2201.09450},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

About

A collection of SOTA Image Classification Models in PyTorch

License:MIT License


Languages

Language:Python 98.3%Language:Jupyter Notebook 1.7%