Disparities between NASBench-201 and NATS-Bench papers
Mirofil opened this issue · comments
Hello,
I noticed the final accuracies of searched models in the NATS-Bench paper are generally quite a bit higher than in the original NASBench-201 paper especially for the weight sharing methods. I assume that is because the hyperparameters for those were changed (I see there is a note about that in the second paper)?
Furthermore, the NASBench 201 paper has Table 6 in the Appendix which shows ~90% correlation of the 12-epochs training protocol with the performance of the 200-epochs training. However, when I try to reproduce it on NATS-Bench, I get only ~80%. Do you know if that is intended?
Thanks in advance
Thanks for your interest.
For the first question, there are three reasons:
- In NATS-Bench, we search on every dataset, e.g., we search on CIFAR-100 and report the searched architecture's CIFAR-100 accuracy. However, in NAS-Bench-201, we only search on CIFAR-10 and report the performance of CIFAR-10-searched architecture on three datasets.
- Some hyperparameters and implementation details are upgraded.
- For multi-trial-based search methods, we set a different time budget on CIFAR-100 / ImageNet-16-120. See caption of Figure 7.
- The benchmark files are updated.
For the second question, I just have a try on my side, it is about 91%. Please see my demo code here: https://github.com/D-X-Y/AutoDL-Projects/blob/main/notebooks/NATS-Bench/issue-96.ipynb
Let me know if you have any questionos.
Hello,
thanks for the detailed answer. It appears the issue is that I was tracking Spearman correlation rather than Pearson - like that, the correlation is slightly lower at around 80%. But I was able to get the 90+% with Pearson as you did
Thanks!
Good to know that! Yeap, a ranking correlation (e.g., Spearman) is more suitable for this case, thus we have switched to Kendall rank correlation coefficient in NATS-Bench :)