JDAI-CV / FADA

(ECCV 2020) Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why is self-distill-training better?

taintpro98 opened this issue · comments

You trained cityscapes with self distillation mode. I found that flow was not different from train source mode. I didn't understand why it was better. Can you provide some fundaments or theories that can explain this problem ? Thanks

Hi, I believe this is still an open question, which still lacks concrete explanations. I would like to share some opinions, but unfortunately I cannot promise they may always hold true. We perform self-distillation mainly inspired by the previous works like Born-Again Neural Network and Label refinery. Self-distillation would produce pseudo labels for target domain training images and a student network trained on these pseudo labels would perform better on target domain than the teacher model.

Personally, I believe the improvement mainly comes from two parts:

  1. Previous research proved that training a student network could produce labels better consistent with the input image and our student network should perform better than the pseudo labels provided by the teacher.

  2. The self-distillation is performed on target domain training data directly, which would help the network to learn a direct connection between target domain data and the target domain labels and help the network adapted to target domain better.