ZQSIAT / Vicinity-Vision-Transformer

lingpengkong

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Vicinity-Vision-Transformer

This repo is the official implementations of Vicinity Vision Transformer. It serves as a general-purpose backbone for image classification, semantic segmentation, object detction tasks.

if you use this code, please cite:

@misc{sun2022vicinity,
      title={Vicinity Vision Transformer}, 
      author={Weixuan Sun and Zhen Qin and Hui Deng and Jianyuan Wang and Yi Zhang and Kaihao Zhang and Nick Barnes and Stan Birchfield and Lingpeng Kong and Yiran Zhong},
      year={2022},
      eprint={2206.10552},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Todo

  • Segmentation code
  • ImageNet21k pre-training

Weight

VVT on ImageNet-1K

Method Size Acc@1 #Params (M) Link
VVT-tiny 224 79.2 12.9 link
VVT-tiny 384 80.3 12.9
VVT-small 224 82.6 25.5 link
VVT-small 384 83.4 25.5
VVT-medium 224 83.8 47.9 link
VVT-large 224 84.1 61.8 link
VVT-large 384 84.7 61.8

Our code is developed based on TIMM and PVT

About

lingpengkong

License:Apache License 2.0


Languages

Language:Python 95.8%Language:Shell 4.2%