The loss value didn't drop when it came down to about 7.X?

Question

The loss value didn't drop when it came down to about 7.X?

wangzhe0623 opened this issue 6 years ago · comments

Hi, @chuanqi305 Thanks for your great job, I got some problems here. I used scrips to convert tf model to caffe model and got "deploy.caffemodel". After that, I used that weights to fine tune on my 2-class dataset. Network configurations are all as you provided in "ssdlite/voc", I changed layer names and output channels of "conf" layers. While training, learning rate is 0.0001 at the beginning, the loss start to drop util it was about 7. So weights didn't converge at all. I wondered what's wrong with
it? Eager for your reply~~~~

yokings · Answer 1 · Thu Jul 19 2018 18:57:41 GMT+0800 (China Standard Time)

i have the same problem...

zyc4me · Answer 2 · Fri Jul 27 2018 09:09:23 GMT+0800 (China Standard Time)

same question，have you solved it？

jimchen2018 · Answer 3 · Thu Aug 02 2018 16:57:53 GMT+0800 (China Standard Time)

Same question.......

I0802 16:57:27.196552 21083 solver.cpp:259]     Train net output #0: mbox_loss = 6.57651 (* 1 = 6.57651 loss)
I0802 16:57:27.196557 21083 sgd_solver.cpp:138] Iteration 7310, lr = 0.05
I0802 16:57:28.975070 21083 solver.cpp:243] Iteration 7320, loss = 7.2598
I0802 16:57:28.975109 21083 solver.cpp:259]     Train net output #0: mbox_loss = 7.0054 (* 1 = 7.0054 loss)
I0802 16:57:28.975116 21083 sgd_solver.cpp:138] Iteration 7320, lr = 0.05
I0802 16:57:30.831727 21083 solver.cpp:243] Iteration 7330, loss = 7.34756
I0802 16:57:30.831763 21083 solver.cpp:259]     Train net output #0: mbox_loss = 6.6218 (* 1 = 6.6218 loss)
I0802 16:57:30.831768 21083 sgd_solver.cpp:138] Iteration 7330, lr = 0.05
I0802 16:57:32.776068 21083 solver.cpp:243] Iteration 7340, loss = 7.76406
I0802 16:57:32.776100 21083 solver.cpp:259]     Train net output #0: mbox_loss = 7.26238 (* 1 = 7.26238 loss)
I0802 16:57:32.776106 21083 sgd_solver.cpp:138] Iteration 7340, lr = 0.05
I0802 16:57:34.534003 21083 solver.cpp:243] Iteration 7350, loss = 7.65554
I0802 16:57:34.534036 21083 solver.cpp:259]     Train net output #0: mbox_loss = 7.10536 (* 1 = 7.10536 loss)
I0802 16:57:34.534042 21083 sgd_solver.cpp:138] Iteration 7350, lr = 0.05
I0802 16:57:36.399013 21083 solver.cpp:243] Iteration 7360, loss = 6.90834
I0802 16:57:36.399049 21083 solver.cpp:259]     Train net output #0: mbox_loss = 6.39814 (* 1 = 6.39814 loss)
I0802 16:57:36.399055 21083 sgd_solver.cpp:138] Iteration 7360, lr = 0.05
I0802 16:57:38.430330 21083 solver.cpp:243] Iteration 7370, loss = 7.53202
I0802 16:57:38.430369 21083 solver.cpp:259]     Train net output #0: mbox_loss = 7.61347 (* 1 = 7.61347 loss)

zyc4me · Answer 4 · Thu Aug 09 2018 13:55:59 GMT+0800 (China Standard Time)

just continue to train

wangzhe0623 · Answer 5 · Tue Aug 14 2018 14:46:51 GMT+0800 (China Standard Time)

@allenwangcheng @jimchen2018 @jimchen2018
I solved it. My dataset is quite difficult``````````

yulong112 · Answer 6 · Wed Aug 15 2018 12:35:30 GMT+0800 (China Standard Time)

@wangzhe0623 Hello, I met the same problem with you. Could you tell me the meaning of difficult dataset? Is your dataset too complex to train?

wangzhe0623 · Answer 7 · Wed Aug 15 2018 14:40:15 GMT+0800 (China Standard Time)

@yulong112 YES！

yulong112 · Answer 8 · Wed Aug 15 2018 16:00:28 GMT+0800 (China Standard Time)

@wangzhe0623 Thanks！ But what are your solutions? Just continue to train? Or change your dataset?

zhanghanbin3159 · Answer 9 · Thu Sep 06 2018 10:01:22 GMT+0800 (China Standard Time)

@wangzhe0623 Also want to know your solutions,thanks!
how much is your train dataset size and trained Iteration?

xielixun · Answer 10 · Sun Apr 28 2019 16:18:12 GMT+0800 (China Standard Time)

I have the same question, and I search some solving method , I found the batch_norm_param use_global_stats must be false, after I add the param , my trainning loss is decreased, you can try it, may be help you!!! And the BN param like the following:
layer {
name: "conv_1/expand/bn"
type: "BatchNorm"
bottom: "conv_1/expand"
top: "conv_1/expand"
batch_norm_param {

 use_global_stats: false
 eps: 1e-5
 #eps: 0.001

}
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
}

passion3394 · Answer 11 · Sat May 18 2019 09:37:42 GMT+0800 (China Standard Time)

@Hanlos the value of use_global_stats is false when in phrase TRAIN, so I think it's no correctional with use_global_stats, did you modify some other values?

wsycl · Answer 12 · Wed May 22 2019 11:02:16 GMT+0800 (China Standard Time)

@wangzhe0623 Hello,I used widerface to train the SSD_Lite model, Could you please tell me how to use deploy.caffemodel to fine tune on 2-class dataset? I mean that you convert the cocomodel to a 2-class model directly? Or convert the cocomodel to vocmodel, then you just use the part weights to fituning your model? Please help me, Thxs.

1343464520 · Answer 13 · Thu May 14 2020 14:02:07 GMT+0800 (China Standard Time)

@Hanlos the value of use_global_stats is false when in phrase TRAIN, so I think it's no correctional with use_global_stats, did you modify some other values?

i have the same problem...have you been solved it? Please, help me. Thanks!

weilanShi · Answer 14 · Mon Jun 29 2020 19:38:18 GMT+0800 (China Standard Time)

Hi, @chuanqi305 Thanks for your great job, I got some problems here. I used scrips to convert tf model to caffe model and got "deploy.caffemodel". After that, I used that weights to fine tune on my 2-class dataset. Network configurations are all as you provided in "ssdlite/voc", I changed layer names and output channels of "conf" layers. While training, learning rate is 0.0001 at the beginning, the loss start to drop util it was about 7. So weights didn't converge at all. I wondered what's wrong with
it? Eager for your reply~~~~

Increase the learning rate and adopt annealing method，using this method and the loss decreased from 4 to 2 in two hours