khanrc / swad

Official Implementation of SWAD (NeurIPS 2021)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Results on PACS and VLCS

alexrame opened this issue · comments

Thank you for the interesting work.
Reading your paper, I noticed that your results for PACS are 88.1 in Table 2, yet are 87.1 in the "ablation study" Table 5.
Similarly, you reported 79.1 for VLCS in Table 2 and then 78.9 in Table 5.
What causes these differences?

PS: it would be very helpful if you could include your HP of the SWAD-specific hyperparameters in the hparams_registry.py, to facilitate future comparison.

They have different dataset splits. For Table 5 (and Table 3), we splat the in-domain datasets into 6:2:2 for the training:validation:test, to report in-domain performance. On the other hand, there is no in-domain test set in the other experiments (this is the default DomainBed setting), so we splat the datasets into 8:2 for the training:validation.