multi teacher KD cannot achieve better performance than seperate models

Question

multi teacher KD cannot achieve better performance than seperate models

Guocode opened this issue 4 years ago · comments

In the multi teacher KD experinment resdcn18_KD_woGT_scratch always performs a little worse than resdcn18 on crowd dataset, even if pretrained on imagenet, and it outperforms on another dataset, how does it happen?

cheng peng · Answer 1 · Wed Dec 02 2020 11:38:39 GMT+0800 (China Standard Time)

Good question！
There may be many reasons for this：

The imbalance instances of different datasets：
1).crowd human has 352978 human instance，and the other has 1/3 or 1/8 of it.
2). the crowd human model is well trained and the other model is not trained well for lack of annotations.
3). So multi-teacher KD helps model to training more dataset, and the lack of training dataset well increase mAP.
4). Theoretically，a multi-class model is worse than one-class model, so the well trained crowd human model is worse than baseline.
My super parameters is not the best one.

Guocode · Answer 2 · Wed Dec 02 2020 15:22:39 GMT+0800 (China Standard Time)

I didn't understand 1.3), multi-teacher will feed more dataset to the baseline model beyond either isolated dataset, theoretically it should perform better than either baseline model. I would like to try to explain that crowd_human dataset covers a wider domain than wider face or coco car, so I guess that a wider domain task will benefit a narrow one but hurt itself if put them together. So we still need to carefully merge mutli datasets with different labels before we find a method can definitively promote both.

cheng peng · Answer 3 · Wed Dec 02 2020 15:37:33 GMT+0800 (China Standard Time)

What you said might be one of the reasons. Domain Has a great effect.

“theoretically it should perform better than either baseline model”: this is right when both KD model and baseline are one-class detector, but i think a multi-class KD model might be worse than a single-class baseline in some specific datasets.