Vicinity-Vision-Transformer

This repo is the official implementations of Vicinity Vision Transformer. It serves as a general-purpose backbone for image classification, semantic segmentation, object detction tasks.

if you use this code, please cite:

@misc{sun2022vicinity,
      title={Vicinity Vision Transformer}, 
      author={Weixuan Sun and Zhen Qin and Hui Deng and Jianyuan Wang and Yi Zhang and Kaihao Zhang and Nick Barnes and Stan Birchfield and Lingpeng Kong and Yiran Zhong},
      year={2022},
      eprint={2206.10552},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Todo

Segmentation code
ImageNet21k pre-training

Weight

VVT on ImageNet-1K

Method	Size	Acc@1	#Params (M)	Link
VVT-tiny	224	79.2	12.9	link
VVT-tiny	384	80.3	12.9
VVT-small	224	82.6	25.5	link
VVT-small	384	83.4	25.5
VVT-medium	224	83.8	47.9	link
VVT-large	224	84.1	61.8	link
VVT-large	384	84.7	61.8

Our code is developed based on TIMM and PVT

About

lingpengkong

Apache License 2.0

Languages

Language:Python 95.8%Language:Shell 4.2%