Vicinity-Vision-Transformer
This repo is the official implementations of Vicinity Vision Transformer. It serves as a general-purpose backbone for image classification, semantic segmentation, object detction tasks.
if you use this code, please cite:
@misc{sun2022vicinity,
title={Vicinity Vision Transformer},
author={Weixuan Sun and Zhen Qin and Hui Deng and Jianyuan Wang and Yi Zhang and Kaihao Zhang and Nick Barnes and Stan Birchfield and Lingpeng Kong and Yiran Zhong},
year={2022},
eprint={2206.10552},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Todo
- Segmentation code
- ImageNet21k pre-training
Weight
VVT on ImageNet-1K
Method | Size | Acc@1 | #Params (M) | Link |
---|---|---|---|---|
VVT-tiny | 224 | 79.2 | 12.9 | link |
VVT-tiny | 384 | 80.3 | 12.9 | |
VVT-small | 224 | 82.6 | 25.5 | link |
VVT-small | 384 | 83.4 | 25.5 | |
VVT-medium | 224 | 83.8 | 47.9 | link |
VVT-large | 224 | 84.1 | 61.8 | link |
VVT-large | 384 | 84.7 | 61.8 |