Training loss

Question

Training loss

maolin23 opened this issue 8 years ago · comments

Hi,

Could you tell me about the loss during training?
When I use your code to train front-end model, my loss is about 2、3 in the first 15 iteration.
After that, my loss increase to 50 ~ 80 and be stabled.
After 20K iterations, it is still about 60~80.
I'm not sure the correctness of this situation....
Could you tell me this situation is normal or not?
What loss is correct?
(My training/testing input images are all original and I don't change anything in the train.py.)

Thanks a lot,
Mao

Hao Luo · Answer 1 · Thu Aug 25 2016 21:46:21 GMT+0800 (China Standard Time)

@maolin23 Have you solved the problem. I have also met this problem and I have tried some different batch_size and iter_size. Sometimes the loss changes as you say and sometimes it changes normally. More specifically, when iter_size is 1, the loss changes normally of high probability and when iter_size is larger, loss always changes abnormally.

Fisher Yu · Answer 2 · Fri Aug 26 2016 08:34:22 GMT+0800 (China Standard Time)

The loss of initial stages should be around 3.0 for a 19 category classification problem. If you observe something much bigger than that, it probably indicates the optimization diverges. It is hard to diagnose the exact problems without more information. But if you are using the parameters and datasets described in the dilation paper, it is unlikely to happen.

Hao Luo · Answer 3 · Fri Aug 26 2016 12:39:52 GMT+0800 (China Standard Time)

@fyu I train the frontend net on vgg_conv.caffemodel for initialization and only change batch_size to 8 for my limited GPU. It still diverges sometimes.

Jiong Gong · Answer 4 · Mon Jan 09 2017 14:47:35 GMT+0800 (China Standard Time)

I got the same problem with batch size 8 but better with batch size 7. Why is there big difference with subtle batch size change?

Fisher Yu · Answer 5 · Mon Jan 09 2017 16:39:34 GMT+0800 (China Standard Time)

I just added an option to set iter_size in the training options: https://github.com/fyu/dilation/blob/master/train.py#L233. If your GPU doesn't have enough memory and you have to decrease the batch size, you can try to increase iter_size.

Yubin Wang · Answer 6 · Wed Mar 01 2017 18:12:06 GMT+0800 (China Standard Time)

@maolin23 @jgong5 @fyu have you solved the problem? I also met this problem, after I changed the finaly layer with 'xavier' initialization, the loss seems better. But I have not finished my training process.

Jiong Gong · Answer 7 · Wed Mar 01 2017 22:06:59 GMT+0800 (China Standard Time)

No. I gave up eventually and turned to Berkeley’s FCN. You mean your change can get the network converge eventually? From: Yubin Wang [mailto:notifications@github.com] Sent: Wednesday, March 01, 2017 6:12 PM To: fyu/dilation <dilation@noreply.github.com> Cc: Gong, Jiong <jiong.gong@intel.com>; Mention <mention@noreply.github.com> Subject: Re: [fyu/dilation] Training loss (#12) @maolin23<https://github.com/maolin23> @jgong5<https://github.com/jgong5> @fyu<https://github.com/fyu> have you solved the problem? I also met this problem, after I changed the finaly layer with 'xavier' initialization, the loss seems better. But I have not finished my training process. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#12 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AH-NN-Otle7-vbYsxnNFkyXxNRBmuUwvks5rhUR3gaJpZM4JSmym>.

Yubin Wang · Answer 8 · Thu Mar 02 2017 16:13:14 GMT+0800 (China Standard Time)

@jgong5 Unfortunately, the loss becomes bigger after 200 iteration with batchsize 6.

Yubin Wang · Answer 9 · Fri Mar 03 2017 15:01:48 GMT+0800 (China Standard Time)

@jgong5 @fyu After I initialized the net with vgg_conv and initialization weights with xavier, the training loss seems better, and going down as the iteration increase, after 3w iteration, the training loss is about 10^-6. However, the test accuracy is always -nan, and the test results are all black. I train with my custom dataset.

huangh12 · Answer 10 · Fri Apr 21 2017 18:14:50 GMT+0800 (China Standard Time)

@maolin23 @TX2012LH @jgong5 @austingg same problem as you guys. I just train the net in VOC07(less than 500 images), it's quite weired the net fail to converge since the dataset is so small...
However, it seems that the author stop offering supports now.

ice-pice · Answer 11 · Mon Jun 12 2017 18:35:08 GMT+0800 (China Standard Time)

Hi @fyu
I have run the training of VGG front-end model based on the documentation you have provided.
However, the loss appears to be diverging very soon as you can see in this log.
I have cross-verified the hyper-parameters you have mentioned in the paper against the ones written in the documentation and they seem to be matching.
Same divergence issue can be seen with the joint training.

I am running your code using cuda-8.0 and cudnn-5.
Can you kindly run your demo from scratch and tell us where the issue might be?
A lot of people here seem to be facing the same issue.

Thanks!

xingbotao · Answer 12 · Mon Jun 26 2017 15:51:00 GMT+0800 (China Standard Time)

Is label one channel or RGB channel?

Fisher Yu · Answer 13 · Mon Jun 26 2017 16:10:26 GMT+0800 (China Standard Time)

one channel

HXKwindwizard · Answer 14 · Tue Jul 25 2017 19:23:51 GMT+0800 (China Standard Time)

Hi, @fyu . Thank you for yours excellent codes. I met a problem that, when I use the trained models (the loss near 2) and the test_net.txt (frontend or joint ) to do the prediction for a figure, the resulting figure is always black and there is nothing on this figure.
Is there anything I need to do before the prediction? Thanks ahead

Fisher Yu · Answer 15 · Tue Jul 25 2017 19:29:05 GMT+0800 (China Standard Time)

@HXKwindwizard If loss is 2, it is a bad sign, saying that the model is not working properly. Probably your data is too different from what the model was trained on. It may solve the problem to train the model on your data.

HXKwindwizard · Answer 16 · Tue Jul 25 2017 19:39:00 GMT+0800 (China Standard Time)

@fyu thanks for your reminding. I use the pascal voc datasets and funtune based on vgg that you suggested. I have done several trainings based on this data. When the loss is sometimes arount 10, the situation I mentioned above still exists. So I wonder, evenif the traing is not good, the prediction resulting figure can not be always black. Yet I use your trained model to do the prediciton, the reult is quite good. Is there any relationship with the network structure ? (I use the test.net to serve as the prototxt).