Disparities between NASBench-201 and NATS-Bench papers

Question

Disparities between NASBench-201 and NATS-Bench papers

Mirofil opened this issue 3 years ago · comments

Hello,

I noticed the final accuracies of searched models in the NATS-Bench paper are generally quite a bit higher than in the original NASBench-201 paper especially for the weight sharing methods. I assume that is because the hyperparameters for those were changed (I see there is a note about that in the second paper)?

Furthermore, the NASBench 201 paper has Table 6 in the Appendix which shows ~90% correlation of the 12-epochs training protocol with the performance of the 200-epochs training. However, when I try to reproduce it on NATS-Bench, I get only ~80%. Do you know if that is intended?

Thanks in advance

Xuanyi Dong · Answer 1 · Mon Mar 01 2021 20:40:14 GMT+0800 (China Standard Time)

Thanks for your interest.

For the first question, there are three reasons:

In NATS-Bench, we search on every dataset, e.g., we search on CIFAR-100 and report the searched architecture's CIFAR-100 accuracy. However, in NAS-Bench-201, we only search on CIFAR-10 and report the performance of CIFAR-10-searched architecture on three datasets.
Some hyperparameters and implementation details are upgraded.
For multi-trial-based search methods, we set a different time budget on CIFAR-100 / ImageNet-16-120. See caption of Figure 7.
The benchmark files are updated.

For the second question, I just have a try on my side, it is about 91%. Please see my demo code here: https://github.com/D-X-Y/AutoDL-Projects/blob/main/notebooks/NATS-Bench/issue-96.ipynb

Let me know if you have any questionos.

Kogael · Answer 2 · Tue Mar 02 2021 03:42:28 GMT+0800 (China Standard Time)

Hello,

thanks for the detailed answer. It appears the issue is that I was tracking Spearman correlation rather than Pearson - like that, the correlation is slightly lower at around 80%. But I was able to get the 90+% with Pearson as you did

Thanks!

Xuanyi Dong · Answer 3 · Tue Mar 02 2021 09:35:22 GMT+0800 (China Standard Time)

Good to know that! Yeap, a ranking correlation (e.g., Spearman) is more suitable for this case, thus we have switched to Kendall rank correlation coefficient in NATS-Bench :)