DomainBed result reproducability

Question

DomainBed result reproducability

AmirEstiri opened this issue a year ago · comments

Hi,
I have been trying to reproduce the results of your paper in the domainbed benchmark.
From what I understood, your code (train_all.py) only runs for cases where there is only one test domain. Which is not the domainbed benchmark (domainbed runs for all scenarios even multiple test domains and one train domain).
This causes the results of SWAD to be incomparable to other methods implemented on domainbed.
This is the part of code that I was talking about: (line 144)
args.test_envs = [[te] for te in range(len(dataset))]

Could you please elaborate on this issue?

Thanks in advance

Junbum Cha · Answer 1 · Fri Jul 21 2023 15:16:34 GMT+0800 (China Standard Time)

DomainBed does not run all scenarios. Their runs only additionally include the cases of two test domains for the leave-one-domain-out validation method. Our code does not support the leave-one-domain-out method, so two test domain cases are not required.

Amir Estiri · Answer 2 · Fri Jul 21 2023 23:29:10 GMT+0800 (China Standard Time)

From what I understood, leave-one-domain out only works for one test domain, but other model selection methods support multiple test domains. (As mentioned here).
I think your method only supports leave-one-out.
Please correct me if I'm wrong.

Junbum Cha · Answer 3 · Sat Jul 22 2023 00:47:59 GMT+0800 (China Standard Time)

Leave-one-domain-out requires two test domains and the others (oracle and training-validation) use only one test domain, as mentioned the link you referred. Our main target is training-validation model selection.

Amir Estiri · Answer 4 · Sat Jul 22 2023 04:50:34 GMT+0800 (China Standard Time)

I appreciate you answering the questions.
We are trying to reproduce the SWAD results for leave-one-out and oracle domain selection.
Do you have any script that we could use for that?
Thank you for your helps

Junbum Cha · Answer 5 · Tue Aug 01 2023 05:34:04 GMT+0800 (China Standard Time)

As stated earlier, our results come from the training-validation method; we do not include results from the leave-one-domain-out and oracle selection. What does "reproduce" mean?

p.s. I'm on vacation until 8/7. My response can be delayed.

Amir Estiri · Answer 6 · Wed Aug 02 2023 22:30:43 GMT+0800 (China Standard Time)

I understand that your codes only support training-validation method.
However, I was wondering how can we get the results for other method selection techniques?
Could you share the results if you have access to them or help us in modifying SWAD code to get the results for other model selections?

Junbum Cha · Answer 7 · Tue Aug 08 2023 16:00:02 GMT+0800 (China Standard Time)

SWAD chooses the average range by training-validation loss in https://github.com/khanrc/swad/blob/main/domainbed/trainer.py#L250. You can use different losses (oracle or leave-one-out loss) here. However, please note that SWAD is designed for training-validation loss; we do not know the performance when using different loss.

Amir Estiri · Answer 8 · Fri Aug 11 2023 01:50:12 GMT+0800 (China Standard Time)

Thank you for your clarification!