thuml / Transfer-Learning-Library

I have a question about the training process of MLDG, I saw the data used in this repo has the same size (e.g. SiteA(20 samples), SiteB(20 samples), for meta_train, SiteC(20 samples)), for meta_test), what if the size of multi-site data is not same(e.g. SiteA(30 samples), SiteB(15 samples), for meta_train, SiteC(20 samples))? Could I execute the training in one epoch like the following process: if the smallest site was iterated a round, then it will be reloaded via DataLoader() so that the number of iterations could be the largest site (it means both samples in the largest site was trained)?

Sorry, I fail to understand your meaning here. It seems that you want sample different numbers of samples for each domain. This can be achieved by modifying the following implementation.

Transfer-Learning-Library/examples/domain_generalization/image_classification/utils.py

Line 336 in d950c31

class RandomDomainSampler(Sampler):

抱歉，原谅我蹩脚的英语...我现在在做一个DG实验，我的实验设置设置里是有4个不同的数据集用于训练(domain不是将所有数据集拼接后再随机划分，而是将每个数据集作为一个domain)，其中两个做元训练，一个元测试，一个用于验证；但现在的问题是用于元训练和元测试的样本量并不相同：元训练里的domain1有42例样本，domain2有80例样本；元测试有50例样本。在这种情况下，在每个epoch中为了将domain2中所有的样本都迭代一遍，我只能在domain1和元测试的样本每迭代一轮后重置。请问上述操作方式对于MLDG算法的训练是可行的吗？

抱歉我又回得比较晚哈哈，你这里提到的策略实际上是Domain Adaptation中的标准策略。因为不同的domain样本数往往不一样多，每迭代一轮重置是很常见的。在实现上往往不需要显式的重置，只需要在dataloader之外嵌套一层抽象保证可以循环即可，例如我们library中的ForeverDataIterator实现。

明白了！感谢您倾情回复！我这就去看一下您说的ForeverDataIterator

Training Strategy of MLDG