yulequan / UA-MT

code for MICCAI 2019 paper 'Uncertainty-aware Self-ensembling Model for Semi-supervised 3D Left Atrium Segmentation'.

Home Page:https://arxiv.org/abs/1907.07034

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Does uncertainty help?

JunMa11 opened this issue · comments

commented

Hi @yulequan ,

I did an ablation study on the uncertainty.

image

Specifically, if we do not use the uncertainty to select the most certain targets and use all the voxels during each iteration, will the performance degrade?

To disable the uncertainty based proposal, I simply increase the threshold to 100, thus all the voxels will be used to guide the student learning.

threshold = (0.75+0.25*ramps.sigmoid_rampup(iter_num, max_iterations))*np.log(2)
mask = (uncertainty<threshold).float()
consistency_dist = torch.sum(mask*consistency_dist)/(2*torch.sum(mask)+1e-16)

# my modification
threshold = 100 #(0.75+0.25*ramps.sigmoid_rampup(iter_num, max_iterations))*np.log(2)

However, the results are weird. The performance does not degrade (even little improvements) without using uncertainty.

image

I also did paired T-test, but I didn't find significant differences (p>0.05) between using uncertainty and without uncertainty.

image

Could you help me to figure out what's wrong with my experiments?

The following are all my experiment results (code, trained model, logs...).
Download Link:https://pan.baidu.com/s/1tM6fc_hz3_LE23cLffnFBg
Password:5p1k

Regarding to the source code, I only change the default seeds to 12345.

Best regards,
Jun

Hi Jun,

Have you tried other seeds or do not set the random seed? Since there exists performance perturbation during different training, it is meaningless to compare one pair of experiment results. You can run the experiments many times with/without uncertainty scheme and compare the average performance.

commented

Hi @yulequan

Thanks for your quick reply.
Not yet, I will try another 2 seeds and compare the average performance.

BTW, have you done the uncertainty scheme ablation study? If yes, how much performance gain?

I forget the exact numbers. In our experiments, the uncertainty scheme improves about 0.5% average dice on unlabeled data.

commented

Generally, there will be no statistically significant difference (p<0.05) with less then 1% improvements (e.g., in MICCAI KiTS challenge).

In other words, compared to (bayesian) Vanilla V-Net, the key to the performance improvements is the unlabelled data and Mean Teacher Framework, right?

According to the concept of p-value, you can think so. In my experiment, I mainly focus on average dice and have not calculated p-value.

commented

Got it. Thank you very much.

In addition, I learn a lot from your code which is well-written. I really appreciate that you make the great work publicly available.

Kindest regards,
Jun