khanrc / swad

Official Implementation of SWAD (NeurIPS 2021)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hyperparameter search protocol

optharry opened this issue · comments

Hi,

Thanks for providing this great repo! I have one question regarding the hyperparameter search protocol. In section B.2 in the paper (Table 7), you indicate that the search space is constrained compared to original DomainBed, but there are still random choices on learning rate, weight decay, etc. However, if I understand correctly, the current implementation only uses the default hparams without randomness involved. The only randomness comes from different trails. Is this correct? If so, how is the original experiments conducted in the paper?

Further, how do you compute this 396?

Through the proposed protocol, we find HP for an algorithm under only 396 runs.

Thanks in advance.

In our constrained HP space, we simply experimented all combinations. That is, by combining the three learning rates, three dropout rates, and two weight decays, all 18 combinations were tested, and the best validation HP was selected. We manually conducted this HP search, without any implementation for HP search. Since there are 5 benchmarks, we need to run train_all.py 18*5=90 times. At this time, since train_all.py runs experiments on all target domains at once, it becomes 18*22=396 experiments (22 is the number of all domains in 5 benchmarks).

Note that current implementation provides modifying hparams in CLI. For example, you can set learning rate to 3e-5 by python train_all.py example --lr 3e-5.

Thanks for your quick reply. A follow up question: Is the best validation HP the same as the default one in the current repo (seems the default one is the same as that in DomainBed)? If not, would you mind releasing the best validation HPs for certain algorithms like ERM? It would be very helpful to see the rough best HP value for each dataset.

As written in the paper, we tuned only the tolerance ratio with default HPs for SWAD. You can get the reported results by the provided commands.

Our best validation HPs of the other algorithms are:

  PACS VLCS OfficeHome TerraIncognita DomainNet
ERM (1e-5, 0.1, 1e-4) (1e-5, 0.1, 1e-4) (1e-5, 0.5, 1e-6) (3e-5, 0.0, 1e-4) (3e-5, 0.5, 1e-6)
Mixstyle (1e-5, 0.1, 1e-6) (1e-5, 0.1, 1e-4) (1e-5, 0.5, 1e-6) (3e-5, 0.0, 1e-4) (3e-5, 0.1, 1e-6)
SAM (3e-5, 0.5, 1e-4) (1e-5, 0.5, 1e-4) (1e-5, 0.5, 1e-6) (5e-5, 0.1, 1e-6) (5e-5, 0.1, 1e-6)

Each cell shows the best validation HPs, (learning rate, dropout rate, weight decay).

For reference, we performed similar HP search for SWAD (with slightly modified HP space for tolerance ratio), and this was slightly better than the results reported in the paper (tolerance ratio tuned only version). These results are not included in the paper, because we got the results after the arxiving the paper and the improvement is subtle compared with heavy computational resource for the HP search.

Thanks!