BiFormer: Vision Transformer with Bi-Level Routing Attention

Official PyTorch implementation of BiFormer, from the following paper:

BiFormer: Vision Transformer with Bi-Level Routing Attention. CVPR 2023.
Lei Zhu, Xinjiang Wang, Zhanghan Ke, Wayne Zhang, and Rynson Lau

News

2023-03-18: We are improving the readability and efficiency of BRA, please stay tuned.
- We treasure reproducibility, hence keep the implementation we used during exploration stage. It is a little bit messy, as many components/arugments are kept but not used, which may distract you.
- To make it more readable and optimization-friendly, we are refactoring the interface. This task is expected to be done in two weeks.
- After refactoring, we will start optimizing BRA with CUDA to make it more memory and computationally efficient.
- Collaborations and contributions are welcome, especially if you are an expert in CUDA/cutlass. There is a chance to co-author a paper.

Results and Pre-trained Models

ImageNet-1K trained models

name	resolution	acc@1	#params	FLOPs	model	log	tensorboard log^*
BiFormer-T	224x224	81.4	13.1 M	2.2 G	model	log	-
BiFormer-S	224x224	83.8	25.5 M	4.5 G	model	log	tensorboard.dev
BiFormer-B	224x224	84.3	56.8 M	9.8 G	model	log	-
BiFormer-STL	224x224	82.7	28.4 M	4.6 G	model	log	-

* : reproduced after the acceptance of our paper.

Here the BiFormer-STL(Swin-Tiny-Layout) model is used in our ablation study. We hope it provides a good start proint for developing your own awsome attention mechanisms.

All files can be accessed from onedrive.

Installation

Please check INSTALL.md for installation instructions.

Evaluation

We did evaluation on a slurm cluster environment, using the command below:

python hydra_main.py \
    data_path=./data/in1k input_size=224  batch_size=128 dist_eval=true \
    +slurm=${CLUSTER_ID} slurm.nodes=1 slurm.ngpus=8 \
    eval=true load_release=true model='biformer_small'

To test on a local machine, you may try

python -m torch.distributed.launch --nproc_per_node=8 main.py \
  --data_path ./data/in1k --input_size 224 --batch_size 128 --dist_eval \
  --eval --load_release --model biformer_small

This should give

* Acc@1 83.754 Acc@5 96.638 loss 0.869
Accuracy of the network on the 50000 test images: 83.8%

Note: By setting load_release=true, the released checkpoints will be automatically downloaded, so you do not need to download manually in advance.

Training

To launch training on a slurm cluster, use the command below:

python hydra_main.py \
    data_path=./data/in1k input_size=224  batch_size=128 dist_eval=true \
    +slurm=${CLUSTER_ID} slurm.nodes=1 slurm.ngpus=8 \
    model='biformer_small'  drop_path=0.15 lr=5e-4

Note: Our codebase automatically generates output directory for experiment logs and checkpoints, according to the passed arguments. For example, the command above will produce an output directory like

$ tree -L 3 outputs/ 
outputs/
└── cls
    └── batch_size.128-drop_path.0.15-input_size.224-lr.5e-4-model.biformer_small-slurm.ngpus.8-slurm.nodes.2
        └── 20230307-21:33:26

Acknowledgement

This repository is built using the timm library, and ConvNext, UniFormer repositories.

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Citation

If you find this repository helpful, please consider citing:

@Article{zhu2022biformer,
  author  = {Lei Zhu and Xinjiang Wang and Zhanghan Ke and Wayne Zhang and Rynson Lau},
  title   = {BiFormer: Vision Transformer with Bi-Level Routing Attention},
  journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year    = {2023},
}

TODOs

About

[CVPR 2023] Official code release of our paper "BiFormer: Vision Transformer with Bi-Level Routing Attention"

https://arxiv.org/abs/2303.08810

MIT License

Languages

Language:Python 98.9%Language:Shell 1.1%