Training loss
maolin23 opened this issue · comments
Hi,
Could you tell me about the loss during training?
When I use your code to train front-end model, my loss is about 2、3 in the first 15 iteration.
After that, my loss increase to 50 ~ 80 and be stabled.
After 20K iterations, it is still about 60~80.
I'm not sure the correctness of this situation....
Could you tell me this situation is normal or not?
What loss is correct?
(My training/testing input images are all original and I don't change anything in the train.py
.)
Thanks a lot,
Mao
@maolin23 Have you solved the problem. I have also met this problem and I have tried some different batch_size and iter_size. Sometimes the loss changes as you say and sometimes it changes normally. More specifically, when iter_size is 1, the loss changes normally of high probability and when iter_size is larger, loss always changes abnormally.
The loss of initial stages should be around 3.0 for a 19 category classification problem. If you observe something much bigger than that, it probably indicates the optimization diverges. It is hard to diagnose the exact problems without more information. But if you are using the parameters and datasets described in the dilation paper, it is unlikely to happen.
@fyu I train the frontend net on vgg_conv.caffemodel for initialization and only change batch_size to 8 for my limited GPU. It still diverges sometimes.
I got the same problem with batch size 8 but better with batch size 7. Why is there big difference with subtle batch size change?
I just added an option to set iter_size in the training options: https://github.com/fyu/dilation/blob/master/train.py#L233. If your GPU doesn't have enough memory and you have to decrease the batch size, you can try to increase iter_size.
@jgong5 Unfortunately, the loss becomes bigger after 200 iteration with batchsize 6.
@jgong5 @fyu After I initialized the net with vgg_conv and initialization weights with xavier, the training loss seems better, and going down as the iteration increase, after 3w iteration, the training loss is about 10^-6. However, the test accuracy is always -nan
, and the test results are all black. I train with my custom dataset.
Hi @fyu
I have run the training of VGG front-end model based on the documentation you have provided.
However, the loss appears to be diverging very soon as you can see in this log.
I have cross-verified the hyper-parameters you have mentioned in the paper against the ones written in the documentation and they seem to be matching.
Same divergence issue can be seen with the joint training.
I am running your code using cuda-8.0 and cudnn-5.
Can you kindly run your demo from scratch and tell us where the issue might be?
A lot of people here seem to be facing the same issue.
Thanks!
Is label one channel or RGB channel?
one channel
Hi, @fyu . Thank you for yours excellent codes. I met a problem that, when I use the trained models (the loss near 2) and the test_net.txt (frontend or joint ) to do the prediction for a figure, the resulting figure is always black and there is nothing on this figure.
Is there anything I need to do before the prediction? Thanks ahead
@HXKwindwizard If loss is 2, it is a bad sign, saying that the model is not working properly. Probably your data is too different from what the model was trained on. It may solve the problem to train the model on your data.
@fyu thanks for your reminding. I use the pascal voc datasets and funtune based on vgg that you suggested. I have done several trainings based on this data. When the loss is sometimes arount 10, the situation I mentioned above still exists. So I wonder, evenif the traing is not good, the prediction resulting figure can not be always black. Yet I use your trained model to do the prediciton, the reult is quite good. Is there any relationship with the network structure ? (I use the test.net to serve as the prototxt).