megvii-research / LGD

Official Implementation of the detection self-distillation framework LGD.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance down in some other detectors

Kizna1ver opened this issue · comments

commented

Hi, thanks for greate work.
Have you ever tried LGD in advanced detector, such as TOOD or DDOD? I reimplement LGD in MMDetection and insert LGD to TOOD and DDOD. But it results lower performance than baseline(DDOD mAP down from 41.7 to 38.7 and TOOD mAP down from 42.3 to 38.7) in R50-FPN 1xss setting. BTW, the code in your repo also contain ATSS Detector, have you tried ATSS with LGD? I didn't see this experiment in your paper.
It would be appreciate if you can provide more experiment info :)

commented

And I try the ATSSCT distillator in your repo with the cfg files below. The result shows that LGD improve the mAP to 39.89 compared with the baseline 39.42 in 1xss R50 setting. The baseline mAP seems normal. But the perfomance up from LGD seems not so good. Is the result as expected? I want try some improve work on LGD, and this result is very important for me to do some follow-up work. It would be appreciated if you can provide more experiment info. Thanks! @zhangpzh
Here are cfg files:
atss_R_50_1xSS_prD30K_prS10K_bs16.yaml

_BASE_: "../../oss_baseline/Base-RetinaNet_1xss_bs16.yaml"
MODEL:
 META_ARCHITECTURE: 'DistillatorATSS'
 WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
 MASK_ON: False
 RESNETS:
   DEPTH: 50
 DISTILLATOR:
   TEACHER:
       META_ARCH: 'DynamicTeacher'
       SOLVER:
           OPTIMIZER: 'SGD'
           BASE_LR: 0.01
           MOMENTUM: 0.9
           WEIGHT_DECAY: 1e-4
           LR_SCHEDULER_NAME: "WarmupMultiStepLR"
           STEPS: (60000, 80000)
           GAMMA: 0.1
           WARMUP_FACTOR: 1e-3
           WARMUP_ITERS:  1e03
           WARMUP_METHOD: "linear"
       INTERACT_PATTERN: 'stuGuided'
       DETACH_APPEARANCE_EMBED: False
       ADD_CONTEXT_BOX: True
   STUDENT:
       META_ARCH: 'ATSSCT'
       SOLVER:
           OPTIMIZER: 'SGD'
           BASE_LR: 0.01
           MOMENTUM: 0.9
           WEIGHT_DECAY: 1e-4
           LR_SCHEDULER_NAME: "WarmupMultiStepLR"
           STEPS: (60000, 80000)
           GAMMA: 0.1
           WARMUP_FACTOR: 1e-3
           WARMUP_ITERS:  1e03
           WARMUP_METHOD: "linear"
   ADAPTER:
       META_ARCH: 'SequentialConvs'
   PRE_NONDISTILL_ITERS: 30000
   POST_NONDISTILL_ITERS: 0
   PRE_FREEZE_STUDENT_BACKBONE_ITERS: 10000
   LAMBDA: 1.0
   EVAL_TEACHER: True
INPUT:
 MIN_SIZE_TRAIN: (800,)
SOLVER:
 STEPS: (60000, 80000)
 MAX_ITER: 90000
# OUTPUT_DIR: 'outputs/RetinaNet/retinanet_R_50_1xSS_stuGuided_addCtxBox=YES_detachAppearanceEmbed=NO_preNondistillIters=30k_preFreezeStudentBackboneIters=10k/'

Base-RetinaNet_1xss_bs16.yaml

_BASE_: "./bs32_schedule1x.yaml"
MODEL:
  META_ARCHITECTURE: "RetinaNet"
  # TODO weight and deepth
  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
  BACKBONE:
    NAME: "build_retinanet_resnet_fpn_backbone"
  RESNETS:
    # NORM: "SyncBN"
    OUT_FEATURES: ["res3", "res4", "res5"]
  ANCHOR_GENERATOR:
    SIZES: !!python/object/apply:eval ["[[x, x * 2**(1.0/3), x * 2**(2.0/3) ] for x in [32, 64, 128, 256, 512 ]]"]
  FPN:
    # NORM: "SyncBN"
    IN_FEATURES: ["res3", "res4", "res5"]
  RETINANET:
    IOU_THRESHOLDS: [0.4, 0.5]
    IOU_LABELS: [0, -1, 1]
    SMOOTH_L1_LOSS_BETA: 0.0
DATASETS:
  TRAIN: ("coco_2017_train_oss",)
  TEST: ("coco_2017_val_oss",)
SOLVER:
  IMS_PER_BATCH: 16
  BASE_LR: 0.01  # Note that RetinaNet uses a different default learning rate
  STEPS: (60000, 80000)
  MAX_ITER: 90000
  CLIP_GRADIENTS: {"ENABLED": True}
  CHECKPOINT_PERIOD: 10000
  # warmup
  WARMUP_FACTOR: 1e-3
  WARMUP_ITERS:  1000
  WARMUP_METHOD: "linear"
INPUT:
  MIN_SIZE_TRAIN: (800,)
VERSION: 2
TEST:
  EVAL_PERIOD: 10000
OSS_PREFIX: '/data/oss_bucket_0/'
# OUTPUT_DIR: '' # specified by jobname in mdl args