linxid/Focus-DETR-mindspore

Focus-DETR

This is the official implementation of the paper "Less is More: Focus Attention for Efficient DETR"

Authors: Dehua Zheng, Wenhui Dong, Hailin Hu, Xinghao Chen, Yunhe Wang.

[arXiv] [BibTeX]

Focus-DETR is a model that focuses attention on more informative tokens for a better trade-off between computation efficiency and model accuracy. Compared with the state-of-the-art sparse transformed-based detector under the same setting, our Focus-DETR gets comparable complexity while achieving 50.4AP (+2.2) on COCO.

Main Results with Pretrained Models

Here we provide the pretrained Focus-DETR weights based on detrex.

Pretrained focus_detr with ResNet Backbone

Name	Backbone	Pretrain	Epochs	Denoising Queries	box AP	download
Focus-DETR-R50-4scale	R-50	IN1k	12	100	48.8	model
Focus-DETR-R50-4scale	R-50	IN1k	24	100	50.3	model
Focus-DETR-R50-4scale	R-50	IN1k	36	100	50.4	model
Focus-DETR-R101-4scale	R-101	IN1k	12	100	50.8	model
Focus-DETR-R101-4scale	R-101	IN1k	24	100	51.2	model
Focus-DETR-R101-4scale	R-101	IN1k	36	100	51.4	model

Pretrained focus_detr with Swin-Transformer Backbone

Name	Backbone	Pretrain	Epochs	Denoising Queries	box AP	download
Focus-DETR-Swin-T-224-4scale	Swin-Tiny-224	IN1k	12	100	50.0	model
Focus-DETR-Swin-T-224-4scale	Swin-Tiny-224	IN1k	24	100	51.2	model
Focus-DETR-Swin-T-224-4scale	Swin-Tiny-224	IN1k	36	100	52.5	model
Focus-DETR-Swin-T-224-4scale	Swin-Tiny-224	IN22k to IN1k	36	100	53.2	model
Focus-DETR-Swin-B-384-4scale	Swin-Base-384	IN22k to IN1k	36	100	56.2	model
Focus-DETR-Swin-L-384-4scale	Swin-Large-384	IN22k to IN1k	36	100	56.3	model

Note:

Swin-X-384 means the backbone pretrained resolution is 384 x 384 and IN22k to In1k means the model is pretrained on ImageNet-22k and finetuned on ImageNet-1k.

Installation

Please refer to Installation Instructions for the details of installation.

Training

All configs can be trained with:

cd detrex
python tools/train_net.py --config-file projects/focus_detr/configs/path/to/config.py --num-gpus 8

By default, we use 8 GPUs with total batch size as 16 for training.

Evaluation

Model evaluation can be done as follows:

cd detrex
python tools/train_net.py --config-file projects/focus_detr/configs/path/to/config.py --eval-only train.init_checkpoint=/path/to/model_checkpoint

Citing Focus-DETR

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@misc{zheng2023more,
      title={Less is More: Focus Attention for Efficient DETR}, 
      author={Dehua Zheng and Wenhui Dong and Hailin Hu and Xinghao Chen and Yunhe Wang},
      year={2023},
      eprint={2307.12612},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

[ICCV 2023] Official implementation of the paper "Less is More: Focus Attention for Efficient DETR"

detr object-detection transformer

Apache License 2.0

Languages

Language:Python 99.9%Language:Shell 0.1%

linxid / Focus-DETR-mindspore