The loss is not drop any more on got1k dataset?

Question

The loss is not drop any more on got1k dataset?

wangxiao5791509 opened this issue 5 years ago · comments

Hi, I tried this code and train it on randomly selected 1k videos from got-10k dataset, and the loss droped normally from about 800+ to 100+ (about cost 98,500 iterations). However, the loss do not drop anymore, as following:
[training goturn2.0 joint conv+fc] step = 98534/500000, loss = 173.214868, time = 1.017638 [training goturn2.0 joint conv+fc] step = 98535/500000, loss = 185.998779, time = 1.318390 [training goturn2.0 joint conv+fc] step = 98536/500000, loss = 201.093970, time = 1.261940 [training goturn2.0 joint conv+fc] step = 98537/500000, loss = 162.243616, time = 1.298599 [training goturn2.0 joint conv+fc] step = 98538/500000, loss = 189.368579, time = 1.428025 [training goturn2.0 joint conv+fc] step = 98539/500000, loss = 186.761877, time = 1.108085 [training goturn2.0 joint conv+fc] step = 98540/500000, loss = 190.769653, time = 1.227486 [training goturn2.0 joint conv+fc] step = 98541/500000, loss = 160.548572, time = 1.023933 [training goturn2.0 joint conv+fc] step = 98542/500000, loss = 166.954614, time = 1.313365 [training goturn2.0 joint conv+fc] step = 98543/500000, loss = 158.464966, time = 1.047702 [training goturn2.0 joint conv+fc] step = 98544/500000, loss = 190.491577, time = 1.230863 [training goturn2.0 joint conv+fc] step = 98545/500000, loss = 190.102234, time = 0.995957 [training goturn2.0 joint conv+fc] step = 98546/500000, loss = 185.101526, time = 1.247413 [training goturn2.0 joint conv+fc] step = 98547/500000, loss = 177.304321, time = 1.103242 [training goturn2.0 joint conv+fc] step = 98548/500000, loss = 162.095325, time = 1.248142 [training goturn2.0 joint conv+fc] step = 98549/500000, loss = 155.143604, time = 1.182452 [training goturn2.0 joint conv+fc] step = 98550/500000, loss = 215.679712, time = 1.089420 [training goturn2.0 joint conv+fc] step = 98551/500000, loss = 172.392944, time = 1.197206 [training goturn2.0 joint conv+fc] step = 98552/500000, loss = 194.520471, time = 0.962296 [training goturn2.0 joint conv+fc] step = 98553/500000, loss = 213.370923, time = 1.385756 [training goturn2.0 joint conv+fc] step = 98554/500000, loss = 208.281494, time = 1.098781 [training goturn2.0 joint conv+fc] step = 98555/500000, loss = 154.950000, time = 1.852570 [training goturn2.0 joint conv+fc] step = 98556/500000, loss = 172.599121, time = 0.953801 [training goturn2.0 joint conv+fc] step = 98557/500000, loss = 152.899561, time = 1.244250 [training goturn2.0 joint conv+fc] step = 98558/500000, loss = 177.016675, time = 1.057501

So, do you see similar situations before? How can I further drop this loss? Because at current stage, I tried to run the tracker, the tracking results is really bad even though it is fast. Looking forward to your replay. Thanks.

Abhinav Moudgil · Answer 1 · Sat Aug 24 2019 14:34:28 GMT+0800 (China Standard Time)

I faced this issue of loss stagnating to 100 when I didn't normalize the images the way pytorch pretrained imagenet model expects. Please check your got10k dataloader class once for this.

Xiao Wang（王逍） · Answer 2 · Sat Aug 24 2019 15:35:41 GMT+0800 (China Standard Time)

@amoudgl Thanks for your replay. I just directly train goturn based on this code with got-1k videos, the image is normalized as you implemented. I am sure about this operation is done to all the loaded images. It's really strange.

Xiao Wang（王逍） · Answer 3 · Mon Aug 26 2019 15:15:43 GMT+0800 (China Standard Time)

@amoudgl Hi, I solved this issue today. It is caused by the: torch.nn.utils.clip_grad_norm_(model.parameters(), 0.25) , I added. This makes the loss always near 200. After I removed this line, the loss drops significantly. Thanks for your kind explaination.

Abhinav Moudgil · Answer 4 · Mon Aug 26 2019 23:49:15 GMT+0800 (China Standard Time)

Sounds great. :)