megvii-research / LGD

Official Implementation of the detection self-distillation framework LGD.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LGD: Label-Guided Self-Distillation for Object Detection

This is the official implementation of the AAAI 2022 paper [LGD: Label-Guided Self-Distillation for Object Detection] (https://arxiv.org/abs/2109.11496)

Introduction

TL; DR We propose the first self-distillation framework for general object detection, termed LGD (Label-Guided self-Distillation).

Abstract. In this paper, we propose the first self-distillation framework for general object detection, termed LGD (Label-Guided self-Distillation). Previous studies rely on a strong pretrained teacher to provide instructive knowledge that could be unavailable in real-world scenarios. Instead, we generate an instructive knowledge by inter-and-intra relation modeling among objects, requiring only student representations and regular labels. Concretely, our framework involves sparse label-appearance encoding, inter-object relation adaptation and intra-object knowledge mapping to obtain the instructive knowledge. They jointly form an implicit teacher at training phase, dynamically dependent on labels and evolving student representations. Modules in LGD are trained end-to-end with student detector and are discarded in inference. Experimentally, LGD obtains decent results on various detectors, datasets, and extensive tasks like instance segmentation. For example in MS-COCO dataset, LGD improves RetinaNet with ResNet-50 under 2x single-scale training from 36.2% to 39.0% mAP (+ 2.8%). It boosts much stronger detectors like FCOS with ResNeXt-101 DCN v2 under 2x multi-scale training from 46.1% to 47.9% (+ 1.8%). Compared with a classical teacher-based method FGFI, LGD not only performs better without requiring pretrained teacher but also reduces 51% training cost beyond inherent student learning.

Main Results (upon MS-COCO across regular and supplementary paper sections)

Experiments are mainly conducted with 8x 2080 ti GPUs. We provide results (Table 1, 12 and 13 in the arXiv version) of common detection heads with various backbones equipped with FPN. Particularly, for the usage of Swin-Tiny backbone originally experimented under mmdetection environment in detectron2, you may conduct a conversion of its ImageNet-pretrained weights. We have done it for you and the converted weight file is available at LINK. Simply create a $pretrained_backbones$ sub-directory under ${PROJ} and put the ".pth" file under it. We re-run the experiments after a basic code-refactoring for higher readability. The results are consistent, with only 0.1 mAP difference (+0.1 mostly) compared to that exhibited in the arXiv version. Accompanied with the codes and results, we also release the relevant pretrained models and logs below.

RetinaNet

Backbone mAP config log pretrained model
R-50 40.4 config LINK LINK
R-101 42.1 config LINK LINK
R-101-DCN v2 44.5 config LINK LINK
X-101-DCN v2 45.9 config LINK LINK
Swin-Tiny 45.9 config LINK LINK

FCOS

Backbone mAP config log pretrained model
R-50 42.4 config LINK LINK
R-101 44.0 config LINK LINK
R-101-DCN v2 46.3 config LINK LINK
X-101-DCN v2 47.9 config LINK LINK

Faster R-CNN

Backbone mAP config log pretrained model
R-50 40.5 config LINK LINK
R-101 42.2 config LINK LINK
R-101-DCN v2 44.8 config LINK LINK
X-101-DCN v2 46.2 config LINK LINK

Mask R-CNN

Backbone mAP(box) mAP(mask) config log pretrained model
Swin-Tiny 46.4 42.5 config LINK LINK

Installation

This codebase is built upon [detectron2] (https://github.com/facebookresearch/detectron2)

Requirements

  • Ubuntu 16.04 LTS, CUDA>=10.0, GCC>=5.4.0

  • Python>=3.6.12

  • Virtual environment via Anaconda (>=4.10.3) is recommended:

    conda create -n lgd python=3.7
    

    Activate it by

    conda activate lgd
    
  • detectron2==0.3

  • Pytorch>=1.7.1, torchvision>=0.8.2

  • Other requirements

    pip3 install -r requirements.txt
    
  • Get into the LGD code directory (denoted by ${PROJ}).

    cd ${PROJ}
    

Usage

Dataset preparation

For instance, downloading MS-COCO (https://cocodataset.org/) whose hierarchy is organized as follows:
MSCOCO
  |_ annotations
    |_ instances_train2017.json
    |_ instances_val2017.json
  |_ train2017
  |_ val2017

mkdir ${PROJ}/datasets
ln -s /path/to/MSCOCO datasets/coco

Training

Single Machine

python3 train.py --config-file ${CONFIG} --num-gpus ${NUM_GPUS} --resume

Notes: We normally use 8 gpus at once for each experiment and that means ${NUM_GPUS}=8.

Multiple Machine

For experiments with backbone Swin-Tiny that are proned to be OOM, we opt for built-in distributed training with two machines supported by Pytorch or just a single 8-GPUs machine with larger GPU memory (V100, etc). Below simply showcase a double machine usage.

(1) set the NCCL environment variables on both nodes

export NCCL_IB_DISABLE=1
export NCCL_SOCKET_IFNAME=ib0
export NCCL_TREE_THRESHOLD=0
export GLOO_SOCKET_IFNAME=ib0

(2) training scripts

python3 train.py --num-machines 2 --machine-rank 0 --num-gpus ${NUM_GPUS} --resume --config-file ${CONFIG}

Running above command in the master node and get the tcp address from the screen log, we could then type below script in another machine:

python3 train.py --num-machines 2 --machine-rank 1 --num-gpus ${NUM_GPUS} --resume --dist-url ${TCP_ADDRESS} --config-file ${CONFIG}

Evaluation

It is handy to add [--eval-only] option to turn training command into evaluation usage.

python3 train.py --eval-only --config-file ${CONFIG} MODEL.WEIGHTS ${SNAPSHOT} MODEL.DISTILLATOR.EVAL_TEACHER False

License

Apache v2 © Base Model

Acknowledgement and special thanks

This repository adopts well-developed components (esp. detection head modules and backbone layers) from Detectron2. For more details about official detectron2, please check DETECTRON2. We also refer to cvpods' implementation of FCOS, ATSS and POTO cvpods. For Detectron2-based Swin-Transformer backbone's implementation, we adopt modules from a third party implementation by Hu Ye (https://github.com/xiaohu2015/SwinT_detectron2/blob/main/README.md).

Citing LGD

If you find these useful for your research or project, feel free to cite our paper beneath.

@article{zhang2021lgd,
  title={LGD: Label-guided Self-distillation for Object Detection},
  author={Zhang, Peizhen and Kang, Zijian and Yang, Tong and Zhang, Xiangyu and Zheng, Nanning and Sun, Jian},
  journal={arXiv preprint arXiv:2109.11496},
  year={2021}
}

About

Official Implementation of the detection self-distillation framework LGD.

License:Other


Languages

Language:Python 100.0%