total loss

Question

total loss

jakc4103 opened this issue 7 years ago · comments

Hi
First of all, thanks for implementing this. It's really awesome!
I'm currently modifying the codes to run it on OFFICE data set.
I found out that there is a small difference on the definition of total loss.

In the original paper, the total_loss is defined as :
predict_loss + lambda*domain_loss .

In the code, seems that the lambda term is missing.
I think that's one of the reasons that sometimes the total_loss would go crazy to NaN

Goldit · Answer 1 · Thu Apr 20 2017 17:16:38 GMT+0800 (China Standard Time)

Hi @jakc4103 ,

I think lambda is the l parameter in the flip_gradient
feat = flip_gradient(self.feature, self.l)

l is calculated as follows (see the Training loop in the MNIST-DANN.ipynb)
# Adaptation param and learning rate schedule as described in the paper
p = float(i) / num_steps
l = 2. / (1. + np.exp(-10. * p)) - 1
lr = 0.01 / (1. + 10 * p)**0.75

Yen-Liang Lin · Answer 2 · Wed Jun 28 2017 06:02:55 GMT+0800 (China Standard Time)

I have the similar problem as @jakc4103 when using AlexNet as feature network. The total_loss always goes to NaN for default adaptation param and learning rate. How to set the adaptation parameter and learning rate for handling the loss explosion ?

jakc4103 · Answer 3 · Sun Jul 09 2017 10:52:13 GMT+0800 (China Standard Time)

@Goldit sorry for my misunderstanding. You are right, lambda is a parameter in flip_gradient.

@yenlianglintw I think you can try to set the learning rate lower, or at least lower the learning rate at the pre-trained Alexnet layers. This should somehow solve the NaN loss problem.
But in my experience, sometimes the training won't converge for some reason I did not know. I think this probably is the reason why the original paper did not present all the 6 adaptation tasks on Office dataset.
BTW, another choice is don't fine-tune the parameters of the pre-trained model, just tuned the additional layers you added. this allows you to use a larger learning rate at training.