yulequan / UA-MT

I did an ablation study on the uncertainty.

Specifically, if we do not use the uncertainty to select the most certain targets and use all the voxels during each iteration, will the performance degrade?

To disable the uncertainty based proposal, I simply increase the threshold to 100, thus all the voxels will be used to guide the student learning.

UA-MT/code/train_LA_meanteacher_certainty_unlabel.py

Lines 173 to 175 in 3d40b0d

    
           threshold = (0.75+0.25*ramps.sigmoid_rampup(iter_num, max_iterations))*np.log(2) 
        
           mask = (uncertainty<threshold).float() 
        
           consistency_dist = torch.sum(mask*consistency_dist)/(2*torch.sum(mask)+1e-16)

# my modification
threshold = 100 #(0.75+0.25*ramps.sigmoid_rampup(iter_num, max_iterations))*np.log(2)

However, the results are weird. The performance does not degrade (even little improvements) without using uncertainty.

I also did paired T-test, but I didn't find significant differences (p>0.05) between using uncertainty and without uncertainty.

Could you help me to figure out what's wrong with my experiments?

The following are all my experiment results (code, trained model, logs...).
Download Link：https://pan.baidu.com/s/1tM6fc_hz3_LE23cLffnFBg
Password：5p1k

Regarding to the source code, I only change the default seeds to 12345.

Best regards,
Jun

Hi Jun,

Have you tried other seeds or do not set the random seed? Since there exists performance perturbation during different training, it is meaningless to compare one pair of experiment results. You can run the experiments many times with/without uncertainty scheme and compare the average performance.

Hi @yulequan ，

Thanks for your quick reply.
Not yet, I will try another 2 seeds and compare the average performance.

BTW, have you done the uncertainty scheme ablation study? If yes, how much performance gain?

I forget the exact numbers. In our experiments, the uncertainty scheme improves about 0.5% average dice on unlabeled data.

Generally, there will be no statistically significant difference (p<0.05) with less then 1% improvements (e.g., in MICCAI KiTS challenge).

In other words, compared to (bayesian) Vanilla V-Net, the key to the performance improvements is the unlabelled data and Mean Teacher Framework, right?

According to the concept of p-value, you can think so. In my experiment, I mainly focus on average dice and have not calculated p-value.

Got it. Thank you very much.

In addition, I learn a lot from your code which is well-written. I really appreciate that you make the great work publicly available.

Kindest regards,
Jun

	threshold = (0.75+0.25ramps.sigmoid_rampup(iter_num, max_iterations))np.log(2)
	mask = (uncertainty<threshold).float()
	consistency_dist = torch.sum(maskconsistency_dist)/(2torch.sum(mask)+1e-16)

Does uncertainty help?

Could you help me to figure out what's wrong with my experiments?