chainer / chainercv

ChainerCV: a Library for Deep Learning in Computer Vision

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training not converging with custom dataset on SSD

glefundes opened this issue · comments

Hello,
I'm using transfer learning to detect a new class with SSD (using the script referenced on issue #391)

I can run the training script no problem, but the loss stops decreasing (or just decrases at a painfully slow pace) quite early based on my previous experiences. Do you have any advice on how I could improve my training? I'm afraid that even if I get a reasonably low loss, the current setup won't give me a very good model

lr: 1e-4
batch size: 8 and later 16
Hardware: Tesla K80 12GB

epoch iteration lr main/loss main/loss/loc main/loss/conf validation/main/map
...
192 44500 0.0001 2.16565 0.401897 1.76375
192 44600 0.0001 2.07993 0.397752 1.68218
192 44700 0.0001 1.99905 0.352021 1.64703
193 44800 0.0001 1.98049 0.319228 1.66126
193 44900 0.0001 2.03209 0.326799 1.70529
194 45000 0.0001 2.07301 0.364076 1.70893
194 45100 0.0001 2.02232 0.348952 1.67337
195 45200 0.0001 1.95271 0.368567 1.58414
195 45300 0.0001 1.97565 0.325536 1.65011
195 45400 0.0001 1.96554 0.327404 1.63813
196 45500 0.0001 1.98018 0.334851 1.64533
196 45600 0.0001 2.08441 0.395084 1.68932
197 45700 0.0001 2.04036 0.353833 1.68653
197 45800 0.0001 1.96672 0.334456 1.63226
198 45900 0.0001 1.95015 0.323502 1.62665
198 46000 0.0001 2.04061 0.355772 1.68484
198 46100 0.0001 1.90149 0.338726 1.56277
199 46200 0.0001 2.16579 0.439036 1.72675
199 46300 0.0001 2.05013 0.364389 1.68574
200 46400 0.0001 2.08604 0.403595 1.68244
200 46500 0.0001 1.85117 0.27316 1.57801
201 46600 0.0001 2.05074 0.375543 1.6752
201 46700 0.0001 1.90858 0.342383 1.5662
201 46800 0.0001 1.99748 0.361442 1.63604
202 46900 0.0001 1.94056 0.33401 1.60655
202 47000 0.0001 1.99963 0.370368 1.62927
203 47100 0.0001 1.89865 0.325263 1.57339
203 47200 0.0001 1.96558 0.323745 1.64183
204 47300 0.0001 1.89367 0.313571 1.5801
204 47400 0.0001 2.07271 0.380299 1.69241
204 47500 0.0001 2.08847 0.412245 1.67623
205 47600 0.0001 1.96052 0.309089 1.65143
205 47700 0.0001 1.92523 0.375674 1.54956
206 47800 0.0001 2.01367 0.405505 1.60816
206 47900 0.0001 1.96473 0.322251 1.64248
207 48000 0.0001 2.00153 0.344448 1.65709
207 48100 0.0001 1.91378 0.345453 1.56832
207 48200 0.0001 1.8431 0.250074 1.59303
208 48300 0.0001 1.9456 0.350303 1.5953
208 48400 0.0001 1.86085 0.308576 1.55227
209 48500 0.0001 1.78203 0.273647 1.50838
209 48600 0.0001 1.87274 0.320894 1.55185
210 48700 0.0001 1.90086 0.32055 1.58031
210 48800 0.0001 1.92857 0.330096 1.59847
211 48900 0.0001 1.86798 0.292891 1.57509
211 49000 0.0001 1.82894 0.283909 1.54503
211 49100 0.0001 1.87979 0.308999 1.57079
212 49200 0.0001 1.98635 0.370226 1.61612
212 49300 0.0001 2.02747 0.349052 1.67842
213 49400 0.0001 1.93904 0.339171 1.59987
213 49500 0.0001 2.01575 0.35706 1.65869
214 49600 0.0001 1.96424 0.367705 1.59654
214 49700 0.0001 2.00177 0.391018 1.61075
214 49800 0.0001 1.9873 0.353817 1.63349
215 49900 0.0001 1.95175 0.363682 1.58807
215 50000 0.0001 1.96208 0.327248 1.63483 0.769549
216 50100 0.0001 1.92321 0.347427 1.57578
217 50200 0.0001 1.91955 0.319467 1.60008
218 50300 0.0001 1.89115 0.304973 1.58618
219 50400 0.0001 1.89752 0.359253 1.53827
220 50500 0.0001 1.92998 0.366908 1.56308
220 50600 0.0001 1.90929 0.339895 1.56939
221 50700 0.0001 1.84646 0.312934 1.53353
222 50800 0.0001 1.92209 0.321949 1.60014
223 50900 0.0001 1.82422 0.323542 1.50068
224 51000 0.0001 1.8846 0.316323 1.56828
225 51100 0.0001 1.95603 0.365901 1.59013
226 51200 0.0001 1.88338 0.352842 1.53054
226 51300 0.0001 2.02173 0.389121 1.63261
227 51400 0.0001 1.86447 0.352674 1.5118
228 51500 0.0001 1.86548 0.304266 1.56121
229 51600 0.0001 1.87126 0.332082 1.53917
230 51700 0.0001 1.81064 0.325461 1.48518
231 51800 0.0001 1.88888 0.317625 1.57126
232 51900 0.0001 1.80821 0.299786 1.50843
233 52000 0.0001 1.85853 0.329047 1.52948
233 52100 0.0001 1.81246 0.311888 1.50057
234 52200 0.0001 2.00427 0.338623 1.66564
235 52300 0.0001 1.82236 0.320098 1.50227
236 52400 0.0001 1.92571 0.32956 1.59616
237 52500 0.0001 1.78304 0.29227 1.49077
238 52600 0.0001 1.84892 0.312503 1.53642
239 52700 0.0001 1.94207 0.333517 1.60856
239 52800 0.0001 1.74055 0.297809 1.44274
240 52900 0.0001 1.86484 0.315364 1.54947
241 53000 0.0001 1.7987 0.276109 1.52259
242 53100 0.0001 1.85678 0.323813 1.53297
243 53200 0.0001 1.72929 0.282117 1.44717
244 53300 0.0001 1.85883 0.330652 1.52817
245 53400 0.0001 1.75449 0.282195 1.4723
245 53500 0.0001 1.82358 0.297941 1.52563
246 53600 0.0001 1.74407 0.294349 1.44972
247 53700 0.0001 1.86955 0.338715 1.53084
248 53800 0.0001 1.92192 0.331563 1.59035
249 53900 0.0001 1.78261 0.285467 1.49714
250 54000 0.0001 1.86579 0.317195 1.54859
251 54100 0.0001 1.89726 0.309106 1.58815
251 54200 0.0001 1.73645 0.280518 1.45593
252 54300 0.0001 1.90704 0.33079 1.57625
253 54400 0.0001 1.81646 0.281853 1.5346
254 54500 0.0001 1.92731 0.346972 1.58034
255 54600 0.0001 1.65952 0.248516 1.41101
256 54700 0.0001 1.81472 0.298185 1.51654
257 54800 0.0001 1.79672 0.294534 1.50218
258 54900 0.0001 1.84061 0.310606 1.53001
258 55000 0.0001 1.91252 0.370498 1.54202
259 55100 0.0001 1.86228 0.331397 1.53089
260 55200 0.0001 1.81712 0.309631 1.50749
261 55300 0.0001 1.77558 0.29885 1.47673
262 55400 0.0001 1.8484 0.289876 1.55853
263 55500 0.0001 1.79002 0.277208 1.51281
264 55600 0.0001 1.88899 0.360142 1.52885
264 55700 0.0001 1.82272 0.303158 1.51956
265 55800 0.0001 1.75621 0.278451 1.47776
266 55900 0.0001 1.8645 0.317901 1.5466
267 56000 0.0001 1.88639 0.34495 1.54144
268 56100 0.0001 1.90999 0.366324 1.54367
269 56200 0.0001 1.77732 0.272058 1.50527
270 56300 0.0001 1.81813 0.307901 1.51023
270 56400 0.0001 1.81485 0.301783 1.51306
271 56500 0.0001 1.86718 0.319289 1.54789
272 56600 0.0001 1.76483 0.30666 1.45817
273 56700 0.0001 1.79172 0.307662 1.48406
274 56800 0.0001 1.85992 0.320592 1.53933
275 56900 0.0001 1.72376 0.288744 1.43502
276 57000 0.0001 1.8052 0.300146 1.50506
277 57100 0.0001 1.88341 0.332651 1.55076
277 57200 0.0001 1.75538 0.294625 1.46075
278 57300 0.0001 1.80665 0.293577 1.51308
279 57400 0.0001 1.82841 0.280543 1.54787
280 57500 0.0001 1.74956 0.29359 1.45597
281 57600 0.0001 1.79544 0.304424 1.49101
282 57700 0.0001 1.76389 0.267721 1.49617
283 57800 0.0001 1.78676 0.320731 1.46603
283 57900 0.0001 1.74697 0.303794 1.44317
284 58000 0.0001 1.78893 0.304409 1.48453
285 58100 0.0001 1.81004 0.31383 1.49622
286 58200 0.0001 1.78393 0.299508 1.48442

I have no idea. There can be several problems.

  1. Your dataset annotation is wrong
  2. Your dataset class is wrong, and loaded data is inappropriate
  3. Hyperparameters (lr and others) are not good.
    Because it is your own dataset, we cannot say any advice about 1 and 2.
    Also, 3 is well known issue, but only thing you can do is do experiments again and again and test other candidates.