yyliu01 / PS-MT

[CVPR'22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

Home Page:https://arxiv.org/pdf/2111.12903.pdf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Iteration method

297774951 opened this issue · comments

Hello, I would like to ask about the iteration method used when updating two teachers in your paper (update the first teacher in this iteration, and update the other teacher in the next iteration). I saw the explanation in your paper. Just to increase the diversity between the two teachers. What are the benefits of using an iterative method to update two teachers?

Hi @297774951

Instead of the iteration-wise update, we calculate each of the teacher's parameters in an epoch manner.

The teacher's parameter is highly rely on the student, which is updated via exponential moving average. The SGD optimiser and also the strong augmentations (including CutMix, color jittering) encourage student learned different parameters in different epochs, and thus suggesting different parameters for dual teachers, leading a related higher divergence.

Cheers,
Yuyuan

Why should dual teachers have a related higher divergence?

Can you reply if you have time? Thank you so much

Is it to strengthen the perturbation of the network to improve the generalization of consistency learning?

Hi @wangmingaaaaa

Yes, various strong perturbation causes different optimisation of the student network in epochs, while its update to the teacher is also different. Please note, the comment of "related higher divergence" is in comparison with iterative-wise update method.

I believe dual teachers will eventually fall in same local minima, just like normal MT does, while our goal for such architecture is for more reliable pseudo label throughout the training process.

Cheers,
Yuyuan

Thanks for your reply!

@wangmingaaaaa My pleasure!