Training loss is decreasing, but testing loss and MSE never change?

Question

Training loss is decreasing, but testing loss and MSE never change?

ZoieMo opened this issue 5 years ago · comments

I apply the model the same as yours to other face keypoints database and keep the same experiment settings, but the training procedure is strange. Though the training loss is decreasing normally, the testing loss stays constant after several epochs . What's worst, neither the training nor the testing MSE change. I wonder how should I set the optimizer and the learning rate?

raymon-tian · Answer 1 · Thu Feb 28 2019 16:02:02 GMT+0800 (China Standard Time)

you can debug the training by increasing the order of magnitude for the learning rate. For example, If you set lr 1.0e^8, the BP does not work correctly if the loss doesn't explode.

ZoieMo · Answer 2 · Thu Feb 28 2019 21:11:29 GMT+0800 (China Standard Time)

Thank you a lot for your answer. I've tried many different orders of magnitude for learning rate from small to great, but the results are the same as what I said above. One picture of the training procedure is as follows:

I can't figure out what's wrong with my code yet. What's the exact model you've used for facial keypoints detection? Is it the model in hg.py?

raymon-tian · Answer 3 · Fri Mar 01 2019 00:34:59 GMT+0800 (China Standard Time)

No, the model I used is the KFSGNet in the models.py, which is shown at the 123 line in the train.py. KFSGNet just contains one hourglass. In addition, if you want to train the model on another dataset, you may need to tune the hyperparameter config['sigma'].

ZoieMo · Answer 4 · Fri Mar 01 2019 20:10:41 GMT+0800 (China Standard Time)

I found the problem finally. First, it might be not so accurately to use mask because it only consider the positive sample and there is no negative sample contributing to the loss. So I comment two lines of the code out as follows:

But it's OK if it is just used in some samples without landmark annotation.

Second, the code here used to calculate metric MSE is wrong. I use the peak points of the outputs to calculate MSE instead.

raymon-tian · Answer 5 · Fri Mar 01 2019 21:56:16 GMT+0800 (China Standard Time)

Thanks for your comment. The mask is actually applied to the key points without ground-truth.