Size of target domain

Question

Size of target domain

jsetty opened this issue 3 years ago · comments

Jayanth Siddamsetty commented 3 years ago

I have a training set with 50k source images and 1k target images. Is DANN a good approach for this use case? If not, what is your recommendation?

fungtion · Answer 1 · Thu May 05 2022 09:21:52 GMT+0800 (China Standard Time)

It's decieded not only by the number, but also the similarity between the source and target images, the bigger differences, the more data needed.

Renato Fuzaro Miotto · Answer 2 · Thu Sep 22 2022 04:18:26 GMT+0800 (China Standard Time)

All works I've seen applying this UDA technique considers more or less the same number of data in source and target domains. Now, I am working on a project in which the number of source images is much greater than the target. I am not sure if this is a problem, though.

The only thing is that, by setting num_batches = min(len(train_loader), len(target_loader)) and looping over num_batches as:

for epoch in range(NUM_EPOCHS):
    for batch_index in range(num_batches):
        # forward
        # backward

it would require many "epochs" (maybe calling it "iteration" would be better) to go though the entire training set.

I think it is possible to loop over the entire training set (i.e., num_batches = len(train_loader)), but force the target data to repeat itself multiple times for a given "epoch". To do that, you can use the cycle function from itertools like target_loader = cycle(iter(target_loader)). Then, you could use some data augmentation technique to go around the problem of repeating the target data. Does is make sense?

Jayanth Siddamsetty · Answer 3 · Thu Sep 22 2022 04:46:37 GMT+0800 (China Standard Time)

Does is make sense?

Yes, thanks!