pumpikano / tf-dann

Domain-Adversarial Neural Network in Tensorflow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

total loss

jakc4103 opened this issue · comments

First of all, thanks for implementing this. It's really awesome!
I'm currently modifying the codes to run it on OFFICE data set.
I found out that there is a small difference on the definition of total loss.

In the original paper, the total_loss is defined as :
predict_loss + lambda*domain_loss .

In the code, seems that the lambda term is missing.
I think that's one of the reasons that sometimes the total_loss would go crazy to NaN

Hi @jakc4103 ,

I think lambda is the l parameter in the flip_gradient
feat = flip_gradient(self.feature, self.l)

l is calculated as follows (see the Training loop in the MNIST-DANN.ipynb)
# Adaptation param and learning rate schedule as described in the paper
p = float(i) / num_steps
l = 2. / (1. + np.exp(-10. * p)) - 1
lr = 0.01 / (1. + 10 * p)**0.75

I have the similar problem as @jakc4103 when using AlexNet as feature network. The total_loss always goes to NaN for default adaptation param and learning rate. How to set the adaptation parameter and learning rate for handling the loss explosion ?

@Goldit sorry for my misunderstanding. You are right, lambda is a parameter in flip_gradient.

@yenlianglintw I think you can try to set the learning rate lower, or at least lower the learning rate at the pre-trained Alexnet layers. This should somehow solve the NaN loss problem.
But in my experience, sometimes the training won't converge for some reason I did not know. I think this probably is the reason why the original paper did not present all the 6 adaptation tasks on Office dataset.
BTW, another choice is don't fine-tune the parameters of the pre-trained model, just tuned the additional layers you added. this allows you to use a larger learning rate at training.