Reproducing accuracy

Question

Reproducing accuracy

Johswald opened this issue 4 years ago · comments

hey! Can you share a configuration to reproduce the reported results of the BE on the WRN 28-10 on cifar100.
If I run the script as is, accuracies are worse. So this script: https://github.com/google/uncertainty-baselines/blob/master/baselines/cifar/batchensemble.py

Thank you!

Dustin Tran · Answer 1 · Tue Oct 27 2020 02:13:12 GMT+0800 (China Standard Time)

Hey @Johswald! Thanks for asking. Have you verified that you're running under 8 accelerators? That's the exact setting in the default flags, which reproduces the result (we've checked on cloud TPUs). This is often the culprit, and may suggest we should somehow raise an error if, e.g., the system's # accelerators doesn't match FLAGS.num_cores.

Johannes Oswald · Answer 2 · Tue Oct 27 2020 02:39:38 GMT+0800 (China Standard Time)

Hey @dustinvtran
No, I am just running it on a single GPU which is also quite fast. I can try to parallelise on my node with 8 GPUs if you think this would make a difference? Algorithmically I should not right? (No experience with TPUs :) )

I was asking about the config because it was not clear if this is a cifar10 or cifar100 config. So the one that is the default one in batchensemble.py is the cifar100 one that gets the 81.9% test set (on TPUs)? Are you using the same config for cifar10 and 100?
Thank you again!

Dustin Tran · Answer 3 · Tue Oct 27 2020 06:00:11 GMT+0800 (China Standard Time)

All the CIFAR uses num_cores=8 currently, which means 8 GPUs. This affects the global batch size for each gradient step. The default flag values are for cifar10 and I think you need to just change FLAGS.dataset for cifar100.

Johannes Oswald · Answer 4 · Tue Oct 27 2020 15:41:06 GMT+0800 (China Standard Time)

Ah of course, thanks a lot! I will get back to you once the runs are done.

Zachary Nado · Answer 5 · Tue May 11 2021 05:34:27 GMT+0800 (China Standard Time)

Closing due to inactivity, please reopen if needed!