khanrc / swad

Official Implementation of SWAD (NeurIPS 2021)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DomainBed result reproducability

AmirEstiri opened this issue · comments

Hi,
I have been trying to reproduce the results of your paper in the domainbed benchmark.
From what I understood, your code (train_all.py) only runs for cases where there is only one test domain. Which is not the domainbed benchmark (domainbed runs for all scenarios even multiple test domains and one train domain).
This causes the results of SWAD to be incomparable to other methods implemented on domainbed.
This is the part of code that I was talking about: (line 144)
args.test_envs = [[te] for te in range(len(dataset))]

Could you please elaborate on this issue?

Thanks in advance

DomainBed does not run all scenarios. Their runs only additionally include the cases of two test domains for the leave-one-domain-out validation method. Our code does not support the leave-one-domain-out method, so two test domain cases are not required.

From what I understood, leave-one-domain out only works for one test domain, but other model selection methods support multiple test domains. (As mentioned here).
I think your method only supports leave-one-out.
Please correct me if I'm wrong.

Leave-one-domain-out requires two test domains and the others (oracle and training-validation) use only one test domain, as mentioned the link you referred. Our main target is training-validation model selection.

I appreciate you answering the questions.
We are trying to reproduce the SWAD results for leave-one-out and oracle domain selection.
Do you have any script that we could use for that?
Thank you for your helps

As stated earlier, our results come from the training-validation method; we do not include results from the leave-one-domain-out and oracle selection. What does "reproduce" mean?

p.s. I'm on vacation until 8/7. My response can be delayed.

I understand that your codes only support training-validation method.
However, I was wondering how can we get the results for other method selection techniques?
Could you share the results if you have access to them or help us in modifying SWAD code to get the results for other model selections?

SWAD chooses the average range by training-validation loss in https://github.com/khanrc/swad/blob/main/domainbed/trainer.py#L250. You can use different losses (oracle or leave-one-out loss) here. However, please note that SWAD is designed for training-validation loss; we do not know the performance when using different loss.

Thank you for your clarification!