linxid / Focus-DETR-mindspore

[ICCV 2023] Official implementation of the paper "Less is More: Focus Attention for Efficient DETR"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Focus-DETR

This is the official implementation of the paper "Less is More: Focus Attention for Efficient DETR"

Authors: Dehua Zheng, Wenhui Dong, Hailin Hu, Xinghao Chen, Yunhe Wang.

[arXiv] [BibTeX]

Focus-DETR is a model that focuses attention on more informative tokens for a better trade-off between computation efficiency and model accuracy. Compared with the state-of-the-art sparse transformed-based detector under the same setting, our Focus-DETR gets comparable complexity while achieving 50.4AP (+2.2) on COCO.


Table of Contents

Main Results with Pretrained Models

Here we provide the pretrained Focus-DETR weights based on detrex.

Pretrained focus_detr with ResNet Backbone
Name Backbone Pretrain Epochs Denoising Queries box
AP
download
Focus-DETR-R50-4scale R-50 IN1k 12 100 48.8 model
Focus-DETR-R50-4scale R-50 IN1k 24 100 50.3 model
Focus-DETR-R50-4scale R-50 IN1k 36 100 50.4 model
Focus-DETR-R101-4scale R-101 IN1k 12 100 50.8 model
Focus-DETR-R101-4scale R-101 IN1k 24 100 51.2 model
Focus-DETR-R101-4scale R-101 IN1k 36 100 51.4 model

Pretrained focus_detr with Swin-Transformer Backbone

Name Backbone Pretrain Epochs Denoising Queries box
AP
download
Focus-DETR-Swin-T-224-4scale Swin-Tiny-224 IN1k 12 100 50.0 model
Focus-DETR-Swin-T-224-4scale Swin-Tiny-224 IN1k 24 100 51.2 model
Focus-DETR-Swin-T-224-4scale Swin-Tiny-224 IN1k 36 100 52.5 model
Focus-DETR-Swin-T-224-4scale Swin-Tiny-224 IN22k to IN1k 36 100 53.2 model
Focus-DETR-Swin-B-384-4scale Swin-Base-384 IN22k to IN1k 36 100 56.2 model
Focus-DETR-Swin-L-384-4scale Swin-Large-384 IN22k to IN1k 36 100 56.3 model

Note:

  • Swin-X-384 means the backbone pretrained resolution is 384 x 384 and IN22k to In1k means the model is pretrained on ImageNet-22k and finetuned on ImageNet-1k.

Installation

Please refer to Installation Instructions for the details of installation.

Training

All configs can be trained with:

cd detrex
python tools/train_net.py --config-file projects/focus_detr/configs/path/to/config.py --num-gpus 8

By default, we use 8 GPUs with total batch size as 16 for training.

Evaluation

Model evaluation can be done as follows:

cd detrex
python tools/train_net.py --config-file projects/focus_detr/configs/path/to/config.py --eval-only train.init_checkpoint=/path/to/model_checkpoint

Citing Focus-DETR

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@misc{zheng2023more,
      title={Less is More: Focus Attention for Efficient DETR}, 
      author={Dehua Zheng and Wenhui Dong and Hailin Hu and Xinghao Chen and Yunhe Wang},
      year={2023},
      eprint={2307.12612},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

[ICCV 2023] Official implementation of the paper "Less is More: Focus Attention for Efficient DETR"

License:Apache License 2.0


Languages

Language:Python 99.9%Language:Shell 0.1%