mycorp115 / image-classification

A collection of SOTA Image Classification Models in PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SOTA Image Classification Models in PyTorch

Intended for easy to use and integrate SOTA image classification models into object detection, semantic segmentation, pose estimation, etc.

Open In Colab

visiontransformer

Model Zoo

Model ImageNet-1k Top-1 Acc
(%)
Params
(M)
GFLOPs Variants & Weights
MicroNet 51.4|59.4|62.5 2|2|3 7M|14M|23M M1|M2|M3
MobileFormer 76.7|77.9|79.3 9|11|14 214M|294M|508M 214|294|508
ResNet* 71.5|80.4|81.5 12|26|45 2|4|8 18|50|101
GFNet 80.1|81.5|82.9 15|32|54 2|5|8 T|S|B
HireMLP 79.7|82.1|83.2 18|33|58 2|4|8 T|S|B
WaveMLP 80.9|82.9|83.3 17|30|44 2|5|8 T|S|M
PVTv2 78.7|82.0|83.6 14|25|63 2|4|10 B1|B2|B4
ResT 79.6|81.6|83.6 14|30|52 2|4|8 S|B|L
UniFormer ----|82.9|83.8 --|22|50 -|4|8 -|S|B
PoolFormer 80.3|81.4|82.1 21|31|56 4|5|9 S24|S36|M36
CycleMLP 81.6|83.0|83.2 27|52|76 4|10|12 B2|B4|B5
PatchConvnet 82.1|83.2|83.5 25|48|99 4|8|16 S60|S120|B60
ConvNeXt 82.1|83.1|83.8 28|50|89 5|9|15 T|S|B
Shuffle 82.4|83.6|84.0 28|50|88 5|9|16 T|S|B
Conformer 81.3|83.4|84.1 24|38|83 5|11|23 T|S|B
CSWin 82.7|83.6|84.2 23|35|78 4|7|15 T|S|B

Notes: ResNet* is from "ResNet strikes back" paper.

Table Notes
  • Only include models trained on ImageNet1k with image size of 224x224.
  • Models' weights are from respective official repositories.
  • Large mdoels (Parameters > 100M) are not included.

Usage

Requirements (click to expand)
  • python >= 3.6
  • torch >= 1.8.1
  • torchvision >= 0.9.1

Other requirements can be installed with pip install -r requirements.txt.


Show Available Models
$ python tools/show.py

A table with model names and variants will be shown:

Model Names    Model Variants
-------------  --------------------------------
ResNet         ['18', '34', '50', '101', '152']
MicroNet       ['M1', 'M2', 'M3']
ConvNeXt       ['T', 'S', 'B']
GFNet          ['T', 'S', 'B']
PVTv2          ['B1', 'B2', 'B3', 'B4', 'B5']
ResT           ['S', 'B', 'L']
Conformer      ['T', 'S', 'B']
Shuffle        ['T', 'S', 'B']
CSWin          ['T', 'S', 'B', 'L']
CycleMLP       ['B1', 'B2', 'B3', 'B4', 'B5']
HireMLP        ['T', 'S', 'B']
WaveMLP        ['T', 'S', 'M']
PoolFormer     ['S24', 'S36', 'M36']
PatchConvnet   ['S60', 'S120', 'B60']
UniFormer      ['S', 'B']
Inference
  • Download your desired model's weights from Model Zoo table.
  • Change MODEL parameters and TEST parameters in config file here. And run the the following command.
$ python tools/infer.py --cfg configs/test.yaml

You will see an output similar to this:

File: assests\dog.jpg >>>>> Golden retriever

Training (click to expand)
$ python tools/train.py --cfg configs/train.yaml

Evaluate (click to expand)
$ python tools/val.py --cfg configs/train.yaml

Fine-tune (click to expand)

Fine-tune on CIFAR-10:

$ python tools/finetune.py --cfg configs/finetune.yaml

References (click to expand)

Citations (click to expand)
@misc{li2021micronet,
  title={MicroNet: Improving Image Recognition with Extremely Low FLOPs}, 
  author={Yunsheng Li and Yinpeng Chen and Xiyang Dai and Dongdong Chen and Mengchen Liu and Lu Yuan and Zicheng Liu and Lei Zhang and Nuno Vasconcelos},
  year={2021},
  eprint={2108.05894},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wightman2021resnet,
  title={ResNet strikes back: An improved training procedure in timm}, 
  author={Ross Wightman and Hugo Touvron and Hervé Jégou},
  year={2021},
  eprint={2110.00476},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{rao2021global,
  title={Global Filter Networks for Image Classification}, 
  author={Yongming Rao and Wenliang Zhao and Zheng Zhu and Jiwen Lu and Jie Zhou},
  year={2021},
  eprint={2107.00645},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
  author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
  year={2021},
  eprint={2106.13797},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{zhang2021rest,
  title={ResT: An Efficient Transformer for Visual Recognition}, 
  author={Qinglong Zhang and Yubin Yang},
  year={2021},
  eprint={2105.13677},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{touvron2021augmenting,
  title={Augmenting Convolutional networks with attention-based aggregation}, 
  author={Hugo Touvron and Matthieu Cord and Alaaeldin El-Nouby and Piotr Bojanowski and Armand Joulin and Gabriel Synnaeve and Hervé Jégou},
  year={2021},
  eprint={2112.13692},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{peng2021conformer,
  title={Conformer: Local Features Coupling Global Representations for Visual Recognition}, 
  author={Zhiliang Peng and Wei Huang and Shanzhi Gu and Lingxi Xie and Yaowei Wang and Jianbin Jiao and Qixiang Ye},
  year={2021},
  eprint={2105.03889},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{huang2021shuffle,
  title={Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer}, 
  author={Zilong Huang and Youcheng Ben and Guozhong Luo and Pei Cheng and Gang Yu and Bin Fu},
  year={2021},
  eprint={2106.03650},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{dong2022cswin,
  title={CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows}, 
  author={Xiaoyi Dong and Jianmin Bao and Dongdong Chen and Weiming Zhang and Nenghai Yu and Lu Yuan and Dong Chen and Baining Guo},
  year={2022},
  eprint={2107.00652},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{chen2021cyclemlp,
  title={CycleMLP: A MLP-like Architecture for Dense Prediction}, 
  author={Shoufa Chen and Enze Xie and Chongjian Ge and Runjian Chen and Ding Liang and Ping Luo},
  year={2021},
  eprint={2107.10224},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{guo2021hiremlp,
  title={Hire-MLP: Vision MLP via Hierarchical Rearrangement}, 
  author={Jianyuan Guo and Yehui Tang and Kai Han and Xinghao Chen and Han Wu and Chao Xu and Chang Xu and Yunhe Wang},
  year={2021},
  eprint={2108.13341},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{yu2021metaformer,
  title={MetaFormer is Actually What You Need for Vision}, 
  author={Weihao Yu and Mi Luo and Pan Zhou and Chenyang Si and Yichen Zhou and Xinchao Wang and Jiashi Feng and Shuicheng Yan},
  year={2021},
  eprint={2111.11418},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{guo2021hiremlp,
  title={Hire-MLP: Vision MLP via Hierarchical Rearrangement}, 
  author={Jianyuan Guo and Yehui Tang and Kai Han and Xinghao Chen and Han Wu and Chao Xu and Chang Xu and Yunhe Wang},
  year={2021},
  eprint={2108.13341},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{tang2021image,
  title={An Image Patch is a Wave: Phase-Aware Vision MLP}, 
  author={Yehui Tang and Kai Han and Jianyuan Guo and Chang Xu and Yanxi Li and Chao Xu and Yunhe Wang},
  year={2021},
  eprint={2111.12294},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{liu2022convnet,
  title={A ConvNet for the 2020s}, 
  author={Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
  year={2022},
  eprint={2201.03545},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{li2022uniformer,
  title={UniFormer: Unifying Convolution and Self-attention for Visual Recognition}, 
  author={Kunchang Li and Yali Wang and Junhao Zhang and Peng Gao and Guanglu Song and Yu Liu and Hongsheng Li and Yu Qiao},
  year={2022},
  eprint={2201.09450},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

About

A collection of SOTA Image Classification Models in PyTorch

License:MIT License


Languages

Language:Python 98.3%Language:Jupyter Notebook 1.7%