ConfidNet performs worse than MCP when I reproduce SVHN results

Question

ConfidNet performs worse than MCP when I reproduce SVHN results

pfjaeger opened this issue 3 years ago · comments

Hi Charles!
Thank you for your nice work on ConfidNet and for providing this framework with your paper. For an upcoming publication, I would like to run ConfidNet as a baseline. However, when I try to reproduce your results on SVHN, ConfidNet performs inferior to MCP:

I understand, that results are volatile due to the limited number of incorrect predictions, but I tried multiple runs and always got the same performance pattern. So here is what I did exactly:

I run the standard exp_svhn.yaml and select the best epoch according to val-accuracy
I run confident training with the following configs:

`Data parameters
data:
dataset: svhn
data_dir: /media/paul/ssd1/datasets/svhn
input_size: [32,32]
input_channels: 3
num_classes: 10
valid_size: 0.1

Training parameters
training:
output_folder: /mnt/hdd2/checkpoints/confid_test/svhn_smallconv_run_selfconfid
task: classification
learner: selfconfid
nb_epochs: 200
batch_size: 128
loss:
name: selfconfid_mse
weighting: 1
optimizer:
name: adam
lr: 0.0001
weight_decay: 0.0001
lr_schedule:
ft_on_val: False
metrics: ['accuracy', 'auc', 'ap_success', 'ap_errors']
pin_memory: False
num_workers: 12
augmentations:
normalize: [[0.5, 0.5, 0.5], [0.5, 0.5, 0.5]]

Model parameters
model:
name: small_convnet_svhn_selfconfid_classic
resume: /mnt/hdd2/checkpoints/confid_test/svhn_smallconv_run/model_epoch_052.ckpt # best val-acc of previous encoder-classifier training
feature_dim: 512
uncertainty:`

I select the best epoch according to val-aupr-err
I run fine-tuning wit the following confids:

`Data parameters
data:
dataset: svhn
data_dir: /media/paul/ssd1/datasets/svhn
input_size: [32,32]
input_channels: 3
num_classes: 10
valid_size: 0.1

Training parameters
training:
output_folder: /mnt/hdd2/checkpoints/confid_test/svhn_smallconv_run_finetune
task: classification
learner: selfconfid
nb_epochs: 20
batch_size: 128
loss:
name: selfconfid_mse
weighting: 1
optimizer:
name: adam
lr: 0.0000001 # 1e-7
lr_schedule:
ft_on_val: False
metrics: ['accuracy', 'auc', 'ap_success', 'ap_errors']
pin_memory: False
num_workers: 12
augmentations:
normalize: [[0.5, 0.5, 0.5], [0.5, 0.5, 0.5]]

Model parameters
model:
name: small_convnet_svhn_selfconfid_cloning
resume: /mnt/hdd2/checkpoints/confid_test/svhn_smallconv_run/model_epoch_052.ckpt # best val-acc of previous encoder-classifier training
feature_dim: 512
uncertainty: /mnt/hdd2/checkpoints/confid_test/svhn_smallconv_run_selfconfid/model_epoch_111.ckpt # best AURP-error of previous confidnet training.
`

Again I select the epoch according to best val-aupr-err for testing

I would like to make sure I use your code correctly in order to not report unfair baseline results. It would be great if you could give me feedback on this, thanks in advance!

Charles Corbière · Answer 1 · Wed Apr 07 2021 23:49:26 GMT+0800 (China Standard Time)

Hi Paul,

Thank you for your interest in the paper!

Looking back at my configuration files, I've noticed that using no weight decay during ConfidNet training helped to obtain better performances. You should try without it in your config file, the rest seems fine to me.

Please let me know if it helped :)

Charles

Paul Jaeger · Answer 2 · Sat Apr 10 2021 01:28:08 GMT+0800 (China Standard Time)

Thank you for your quick reply!
Goot hint with the weight decay, this helped indeed. After experiments on all datasets and multiple runs I can see how confidnet is often better than mcp, although with limited consistency: results/rankings seem very volatile and dependent on the current run and train-split etc. Also: mcp of the dropout-based mean softmax seems to be a strong competition for confidnet ;)