thuml / Transfer-Learning-Library

Transfer Learning Library for Domain Adaptation, Task Adaptation, and Domain Generalization

Home Page:http://transfer.thuml.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training Strategy of MLDG

CinKKKyo opened this issue · comments

I have a question about the training process of MLDG, I saw the data used in this repo has the same size (e.g. SiteA(20 samples), SiteB(20 samples), for meta_train, SiteC(20 samples)), for meta_test), what if the size of multi-site data is not same(e.g. SiteA(30 samples), SiteB(15 samples), for meta_train, SiteC(20 samples))? Could I execute the training in one epoch like the following process: if the smallest site was iterated a round, then it will be reloaded via DataLoader() so that the number of iterations could be the largest site (it means both samples in the largest site was trained)?

Sorry, I fail to understand your meaning here. It seems that you want sample different numbers of samples for each domain. This can be achieved by modifying the following implementation.

抱歉,原谅我蹩脚的英语...我现在在做一个DG实验,我的实验设置设置里是有4个不同的数据集用于训练(domain不是将所有数据集拼接后再随机划分,而是将每个数据集作为一个domain),其中两个做元训练,一个元测试,一个用于验证;但现在的问题是用于元训练和元测试的样本量并不相同:元训练里的domain1有42例样本,domain2有80例样本;元测试有50例样本。在这种情况下,在每个epoch中为了将domain2中所有的样本都迭代一遍,我只能在domain1和元测试的样本每迭代一轮后重置。请问上述操作方式对于MLDG算法的训练是可行的吗?

抱歉我又回得比较晚哈哈,你这里提到的策略实际上是Domain Adaptation中的标准策略。因为不同的domain样本数往往不一样多,每迭代一轮重置是很常见的。在实现上往往不需要显式的重置,只需要在dataloader之外嵌套一层抽象保证可以循环即可,例如我们library中的ForeverDataIterator实现。

明白了!感谢您倾情回复!我这就去看一下您说的ForeverDataIterator