chuanqi305 / MobileNetv2-SSDLite

Caffe implementation of SSD and SSDLite detection on MobileNetv2, converted from tensorflow.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The loss value didn't drop when it came down to about 7.X?

wangzhe0623 opened this issue · comments

Hi, @chuanqi305 Thanks for your great job, I got some problems here. I used scrips to convert tf model to caffe model and got "deploy.caffemodel". After that, I used that weights to fine tune on my 2-class dataset. Network configurations are all as you provided in "ssdlite/voc", I changed layer names and output channels of "conf" layers. While training, learning rate is 0.0001 at the beginning, the loss start to drop util it was about 7. So weights didn't converge at all. I wondered what's wrong with
it? Eager for your reply~~~~

i have the same problem...

same question,have you solved it?

Same question.......

I0802 16:57:27.196552 21083 solver.cpp:259]     Train net output #0: mbox_loss = 6.57651 (* 1 = 6.57651 loss)
I0802 16:57:27.196557 21083 sgd_solver.cpp:138] Iteration 7310, lr = 0.05
I0802 16:57:28.975070 21083 solver.cpp:243] Iteration 7320, loss = 7.2598
I0802 16:57:28.975109 21083 solver.cpp:259]     Train net output #0: mbox_loss = 7.0054 (* 1 = 7.0054 loss)
I0802 16:57:28.975116 21083 sgd_solver.cpp:138] Iteration 7320, lr = 0.05
I0802 16:57:30.831727 21083 solver.cpp:243] Iteration 7330, loss = 7.34756
I0802 16:57:30.831763 21083 solver.cpp:259]     Train net output #0: mbox_loss = 6.6218 (* 1 = 6.6218 loss)
I0802 16:57:30.831768 21083 sgd_solver.cpp:138] Iteration 7330, lr = 0.05
I0802 16:57:32.776068 21083 solver.cpp:243] Iteration 7340, loss = 7.76406
I0802 16:57:32.776100 21083 solver.cpp:259]     Train net output #0: mbox_loss = 7.26238 (* 1 = 7.26238 loss)
I0802 16:57:32.776106 21083 sgd_solver.cpp:138] Iteration 7340, lr = 0.05
I0802 16:57:34.534003 21083 solver.cpp:243] Iteration 7350, loss = 7.65554
I0802 16:57:34.534036 21083 solver.cpp:259]     Train net output #0: mbox_loss = 7.10536 (* 1 = 7.10536 loss)
I0802 16:57:34.534042 21083 sgd_solver.cpp:138] Iteration 7350, lr = 0.05
I0802 16:57:36.399013 21083 solver.cpp:243] Iteration 7360, loss = 6.90834
I0802 16:57:36.399049 21083 solver.cpp:259]     Train net output #0: mbox_loss = 6.39814 (* 1 = 6.39814 loss)
I0802 16:57:36.399055 21083 sgd_solver.cpp:138] Iteration 7360, lr = 0.05
I0802 16:57:38.430330 21083 solver.cpp:243] Iteration 7370, loss = 7.53202
I0802 16:57:38.430369 21083 solver.cpp:259]     Train net output #0: mbox_loss = 7.61347 (* 1 = 7.61347 loss)

just continue to train

@allenwangcheng @jimchen2018 @jimchen2018
I solved it. My dataset is quite difficult``````````

@wangzhe0623 Hello, I met the same problem with you. Could you tell me the meaning of difficult dataset? Is your dataset too complex to train?

@wangzhe0623 Thanks! But what are your solutions? Just continue to train? Or change your dataset?

@wangzhe0623 Also want to know your solutions,thanks!
how much is your train dataset size and trained Iteration?

I have the same question, and I search some solving method , I found the batch_norm_param use_global_stats must be false, after I add the param , my trainning loss is decreased, you can try it, may be help you!!! And the BN param like the following:
layer {
name: "conv_1/expand/bn"
type: "BatchNorm"
bottom: "conv_1/expand"
top: "conv_1/expand"
batch_norm_param {

 use_global_stats: false
 eps: 1e-5
 #eps: 0.001

}
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
}

@Hanlos the value of use_global_stats is false when in phrase TRAIN, so I think it's no correctional with use_global_stats, did you modify some other values?

commented

@wangzhe0623 Hello,I used widerface to train the SSD_Lite model, Could you please tell me how to use deploy.caffemodel to fine tune on 2-class dataset? I mean that you convert the cocomodel to a 2-class model directly? Or convert the cocomodel to vocmodel, then you just use the part weights to fituning your model? Please help me, Thxs.

@Hanlos the value of use_global_stats is false when in phrase TRAIN, so I think it's no correctional with use_global_stats, did you modify some other values?

i have the same problem...have you been solved it? Please, help me. Thanks!

Hi, @chuanqi305 Thanks for your great job, I got some problems here. I used scrips to convert tf model to caffe model and got "deploy.caffemodel". After that, I used that weights to fine tune on my 2-class dataset. Network configurations are all as you provided in "ssdlite/voc", I changed layer names and output channels of "conf" layers. While training, learning rate is 0.0001 at the beginning, the loss start to drop util it was about 7. So weights didn't converge at all. I wondered what's wrong with
it? Eager for your reply~~~~

Increase the learning rate and adopt annealing method,using this method and the loss decreased from 4 to 2 in two hours