zhangxiaosong18 / hivit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HiViT (ICLR2023, notable-top-25%)

This is the official implementation of the paper HiViT: A Simple and More Efficient Design of Hierarchical Vision Transformer.

Results

Model Pretraining data ImageNet-1K COCO Det ADE Seg
MAE-base ImageNet-1K 83.6 51.2 48.1
SimMIM-base ImageNet-1K 84.0 52.3 52.8
HiViT-base ImageNet-1K 84.6 53.3 52.8

Pre-training Models

mae_hivit_base_1600ep.pth

mae_hivit_base_1600ep_ft100ep.pth

Usage

1. Supervised learning on ImageNet-1K.: See supervised/get_started.md for a quick start.

2. Self-supervised learning on ImageNet-1K.: See self_supervised/get_started.md.

3. Object detection: See detection/get_started.md.

4. Semantic segmentation: See segmentation/get_started.md.

Bibtex

Please consider citing our paper in your publications if the project helps your research.

@inproceedings{zhanghivit,
  title={HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer},
  author={Zhang, Xiaosong and Tian, Yunjie and Xie, Lingxi and Huang, Wei and Dai, Qi and Ye, Qixiang and Tian, Qi},
  booktitle={International Conference on Learning Representations},
  year={2023},
}

About

License:MIT License


Languages

Language:Jupyter Notebook 85.8%Language:Python 14.2%Language:Shell 0.0%Language:Dockerfile 0.0%Language:Batchfile 0.0%Language:Makefile 0.0%Language:CSS 0.0%