JDAI-CV / centerX

This repo is implemented based on detectron2 and centernet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

multi teacher KD cannot achieve better performance than seperate models

Guocode opened this issue · comments

In the multi teacher KD experinment resdcn18_KD_woGT_scratch always performs a little worse than resdcn18 on crowd dataset, even if pretrained on imagenet, and it outperforms on another dataset, how does it happen?

Good question!
There may be many reasons for this:

  1. The imbalance instances of different datasets:
    1).crowd human has 352978 human instance,and the other has 1/3 or 1/8 of it.
    2). the crowd human model is well trained and the other model is not trained well for lack of annotations.
    3). So multi-teacher KD helps model to training more dataset, and the lack of training dataset well increase mAP.
    4). Theoretically,a multi-class model is worse than one-class model, so the well trained crowd human model is worse than baseline.
  2. My super parameters is not the best one.

I didn't understand 1.3), multi-teacher will feed more dataset to the baseline model beyond either isolated dataset, theoretically it should perform better than either baseline model. I would like to try to explain that crowd_human dataset covers a wider domain than wider face or coco car, so I guess that a wider domain task will benefit a narrow one but hurt itself if put them together. So we still need to carefully merge mutli datasets with different labels before we find a method can definitively promote both.

What you said might be one of the reasons. Domain Has a great effect.

“theoretically it should perform better than either baseline model”: this is right when both KD model and baseline are one-class detector, but i think a multi-class KD model might be worse than a single-class baseline in some specific datasets.