fungtion / DANN_py3

python 3 pytorch implementation of DANN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Size of target domain

jsetty opened this issue · comments

I have a training set with 50k source images and 1k target images. Is DANN a good approach for this use case? If not, what is your recommendation?

It's decieded not only by the number, but also the similarity between the source and target images, the bigger differences, the more data needed.

All works I've seen applying this UDA technique considers more or less the same number of data in source and target domains. Now, I am working on a project in which the number of source images is much greater than the target. I am not sure if this is a problem, though.

The only thing is that, by setting num_batches = min(len(train_loader), len(target_loader)) and looping over num_batches as:

for epoch in range(NUM_EPOCHS):
    for batch_index in range(num_batches):
        # forward
        # backward

it would require many "epochs" (maybe calling it "iteration" would be better) to go though the entire training set.

I think it is possible to loop over the entire training set (i.e., num_batches = len(train_loader)), but force the target data to repeat itself multiple times for a given "epoch". To do that, you can use the cycle function from itertools like target_loader = cycle(iter(target_loader)). Then, you could use some data augmentation technique to go around the problem of repeating the target data. Does is make sense?

Does is make sense?

Yes, thanks!