HiLab-git / SSL4MIS

Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about MSE used here.

JohnMiao1 opened this issue · comments

您好~我最近刚开始接触半监督的东西。很感谢您提供的论文及代码资源,但是在尝试的时候发现我跑train_efficient_unet_2D_mean_teacher.py这个代码的时候在有MSE那一项consistency loss的时候反而使得结果变差了,我对比了在有和没有MSE这一项(其他保持一样)的情况下,用各自最后一次迭代的模型使用test_efficient_unet_2D_acdc.py进行测试,发现在使用MSE的时候测试结果感觉已经崩溃了(使用您提供的ACDC数据)。最差的一类dice只有0.5左右。同时也发现,这个存下来的best模型通常都在迭代次数的靠前位置,而这时通常mse这一项本身就比较小,是不是mse这一项引入反而造成训练不对了。不知道您那边跑代码的过程中是否有类似的情况?或者是我运行的时候有什么需要额外注意的地方。现在都是使用默认参数直接跑了一次。

Hi, JohnMiao1,
Yes, I have met it many times during the mean-teacher model training. To alleviate this problem, you can train the student model with the labeled data at the beginning epochs (maybe 30 or 50 epochs are enough), and then use both labeled data and unlabeled data for the mean-teacher model training. As an alternative, the uncertainty-aware mean teacher is more stable and powerful than the original mean teacher, maybe you can try it on your task. By the way, some novel methods have been proposed to solve this problem (sorry, I can not recommend them for you, you can search from CVPR, ICCV, ECCV papers, and so on).
Best wish,
Xiangde Luo.

Hi, JohnMiao1,
Yes, I have met it many times during the mean-teacher model training. To alleviate this problem, you can train the student model with the labeled data at the beginning epochs (maybe 30 or 50 epochs are enough), and then use both labeled data and unlabeled data for the mean-teacher model training. As an alternative, the uncertainty-aware mean teacher is more stable and powerful than the original mean teacher, maybe you can try it on your task. By the way, some novel methods have been proposed to solve this problem (sorry, I can not recommend them for you, you can search from CVPR, ICCV, ECCV papers, and so on).
Best wish,
Xiangde Luo.

Hi, Xiangde,
Thank you so much for your responses. I am wondering since the MSE affects the performance, whether we should still use the best model with highest validation performance for test or just use the model at the last iteration directly. The problem is we cannot attribute the improvement to the semi-supervision since the best model was trained under a consistency loss which just started to work. I suspect that even we only use the labeled data and utilize the popular mean teachers framework, we can get a little improvement if we run several times. And if we use the model at the last iteration, it is usually much worse than the best model. And I would really appreciate it if you could provide more clues or list a few of them for the novel methods to alleviate this problem you mentioned above. I have no idea about what key words I should search. Thanks a lot.
Best wishes,
John

Hi, John;
In my experiments, I used the best model on the validation set rather than the last iteration or epoch model, as I want to find a high-performance model for practical using, and don't care about the last iteration's or epoch's performance. As for the second question, maybe the uncertainty-aware mean teacher is a good answer.
Best wishes,
Xiangde Luo

Hi, John;
In my experiments, I used the best model on the validation set rather than the last iteration or epoch model, as I want to find a high-performance model for practical using, and don't care about the last iteration's or epoch's performance. As for the second question, maybe the uncertainty-aware mean teacher is a good answer.
Best wishes,
Xiangde Luo

Hi, Xiangde,
Thanks again. According to your reply, do you mean the MSE problem is caused by the unlabeled data especially those predictions may be really poor. Then if we use these poor predictions to calculate the MSE loss and update the model with corresponding gradients, it would cause bad effects on the model performance. So, uncertainty might alleviate this problem. Is that right?

Hi, John,
Sorry, I don't know why the performance drop with the iterations increasing, maybe your understanding is right, and I think it figures out a good direction for the mean-teacher model further research. For the second question, the answer is Yes, please read Dr. Lequan Yu's MICCAI2019 paper.
Best wishes,
Xiangde Luo.

Hi, John,
Sorry, I don't know why the performance drop with the iterations increasing, maybe your understanding is right, and I think it figures out a good direction for the mean-teacher model further research. For the second question, the answer is Yes, please read Dr. Lequan Yu's MICCAI2019 paper.
Best wishes,
Xiangde Luo.

Hi, Xiangde,
Thank you so much~.
Best wishes,
John

Hi, Xiangde,
I have another question. I am really confused about the coefficient of the MSE consistency loss. Could you please provide some tips for its value and ramp up way. Do you usually set the maximun coefficient as 0.01 and increase it slowly from a quite small number to 0.01 at the last iteration?

Hi John,
Maybe you can read "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results" to learn more details. By the way, the ramp up strategy is widely-used in semi-supervised model training, maybe you can modify it for your task.
Best wishes,
Xiangde Luo.

Hi John,
Maybe you can read "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results" to learn more details. By the way, the ramp up strategy is widely-used in semi-supervised model training, maybe you can modify it for your task.
Best wishes,
Xiangde Luo.

Hi, Xiangde,
Thanks. I read the "mean teachers" paper, and the maximum coefficient was set to be 10 or 100, which is not the default number in your code. Therefore, I am a little bit confused about it because I still cannot see improvement of semi-supervision in my own experiments now and I am not sure what is wrong.

Hi John,
Sorry, I don't know which problems you met, but you can modify the default value during your task. By the way, the EfficientUNet is very powerful for ACDC fine-tuning, maybe the semi-supervised methods can not improve the result very few, so the baseline is UNet in many published papers. I will update the code tomorrow for more fair comparison.
Best wishes,
Xiangde Luo.

Hi John,
Sorry, I don't know which problems you met, but you can modify the default value during your task. By the way, the EfficientUNet is very powerful for ACDC fine-tuning, maybe the semi-supervised methods can not improve the result very few, so the baseline is UNet in many published papers. I will update the code tomorrow for more fair comparison.
Best wishes,
Xiangde Luo.

Hi Xiangde,
It's so kind of you to provide the U-Net code. Looking forward to your update.

Hi, John,
Now, the code have been updated !!! You can try it just by running bash train_acdc.sh.
Best wishes,
Xiangde Luo.

Hi, John,
Now, the code have been updated !!! You can try it just by running bash train_acdc.sh.
Best wishes,
Xiangde Luo.

Hi, Xiangde,
Thanks so much! I'll try it as soon as possible. And I am trying to run your code on the ACDC dataset provided by you directly to get more feeling about it (old version). I would like to know according to your experiments, what's your dice for each class with and without semi-supervision respectively. Is that convenient for you to provide (if possible, both U-Net and EfficientUNet)?
Best wishes,
John

Hi, John;
Sorry, I will release all results after I submitted the manuscript, but now work is on going, maybe you can run these code many times or use more labeled data to obtain a stable result.
Best wishes;
Xiangde Luo.

Hi, John;
Sorry, I will release all results after I submitted the manuscript, but now work is on going, maybe you can run these code many times or use more labeled data to obtain a stable result.
Best wishes;
Xiangde Luo.

Hi, Xiangde,
All right. Thanks for your tips.
Best wishes,
John

Hi, Xiangde,
I tried your U-Net code directly. I got the test dice of (75.97, 79.47, 84.81) for the "train_unet_2D_fully_supervised.py" and (72.43, 79.35, 86.65) for the "train_unet_2D_mean_teacher.py" respectively, using the ACDC dataset provided by you. It seems that the improvement is marginal. And a performance drop even happened on the first class (from 75.97 decreases to 72.43). I'm not sure whether it is a normal phenomenon?
Best wishes,
John

Sorry, I am not sure, please run the code several times and calculate the mean dice. More details and information, I can not provide to you, sorry.

Sorry, I am not sure, please run the code several times and calculate the mean dice. More details and information, I can not provide to you, sorry.

OK, thank you. And I will try to run them more times.

You are welcome, good luck to you.