jay-mahadeokar / pynetbuilder

pyNetBuilder is a modular pytonic interface with builtin modules for generating popular caffe prototxt network file definitions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training does not converge for resnet50-ssd on pascal VOC dataset

kristellmarisse opened this issue · comments

I am traing SSD-Resnet50 on pascal VOC dataset. SInce I have a smaller GPU (gtx960 4gb), I reduced the batch size to train. The training loss started at 14 and after 7k iterations it went down to 7. But after that the loss doesn't seem to reduce. Is it because of changing the batch size ?

What is your batch size? I get best results with batch size of 32 (8 per gpu * 4 gpus in parallel), also found that batch size as low as 14 also converges, though results are not best. I would also look at running avg training loss to see whats happening, see the training plot here.

Thank you for the leads. My batch size was only 2 (that was the best I can squeeze into my GPU memory). Is it ok if I increase the batch size by modifying the iter_size parameter in solver.prototxt. I usually use this trick in py-faster-rcnn.

@kristellmarisse I have not tried that setting. Maybe you could try using a smaller network for bigger batch size? See few other resnet models shared here pretrained on imagenet, which give decent top-1 accuracy. The # params field in the comparison tables will influence the model size.

Thank you for sharing more models.

By the way, can you share me your GPU specs on which you trained the Resnet+SSD?

I think its K80 it has 11GB memory.