D-X-Y / AutoDL-Projects

Automated deep learning algorithms implemented in PyTorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Disparities between NASBench-201 and NATS-Bench papers

Mirofil opened this issue · comments


I noticed the final accuracies of searched models in the NATS-Bench paper are generally quite a bit higher than in the original NASBench-201 paper especially for the weight sharing methods. I assume that is because the hyperparameters for those were changed (I see there is a note about that in the second paper)?

Furthermore, the NASBench 201 paper has Table 6 in the Appendix which shows ~90% correlation of the 12-epochs training protocol with the performance of the 200-epochs training. However, when I try to reproduce it on NATS-Bench, I get only ~80%. Do you know if that is intended?

Thanks in advance

Thanks for your interest.

For the first question, there are three reasons:

  • In NATS-Bench, we search on every dataset, e.g., we search on CIFAR-100 and report the searched architecture's CIFAR-100 accuracy. However, in NAS-Bench-201, we only search on CIFAR-10 and report the performance of CIFAR-10-searched architecture on three datasets.
  • Some hyperparameters and implementation details are upgraded.
  • For multi-trial-based search methods, we set a different time budget on CIFAR-100 / ImageNet-16-120. See caption of Figure 7.
  • The benchmark files are updated.

For the second question, I just have a try on my side, it is about 91%. Please see my demo code here: https://github.com/D-X-Y/AutoDL-Projects/blob/main/notebooks/NATS-Bench/issue-96.ipynb

Let me know if you have any questionos.


thanks for the detailed answer. It appears the issue is that I was tracking Spearman correlation rather than Pearson - like that, the correlation is slightly lower at around 80%. But I was able to get the 90+% with Pearson as you did


Good to know that! Yeap, a ranking correlation (e.g., Spearman) is more suitable for this case, thus we have switched to Kendall rank correlation coefficient in NATS-Bench :)