loss during training

Question

loss during training

lemony1314 opened this issue 7 years ago · comments

At first thanks for your work.
I have trained nearly 300000 times by the code you provided. The trained images have been augmented to 5820 .The train_num is 748608.
Then the test loss converges to 0.4 almostly. Is that reasonable?

Zehao Huang · Answer 1 · Thu Mar 09 2017 23:12:32 GMT+0800 (China Standard Time)

Hi, 0.4 is ok. You can test your trained model in Set5 or Set14 and check the PSNR, and you can have a look at my training log, https://raw.githubusercontent.com/huangzehao/caffe-vdsr/master/Train/VDSR_291_multiscale_adam.log.

lemony1314 · Answer 2 · Fri Mar 10 2017 10:18:20 GMT+0800 (China Standard Time)

@huangzehao Thanks for your reply.
I find that the PSNR does not become bigger or converge with the iteration increasing.
when I trained 170000 times , the psnr of butteffly is 29.922.
when I trained 250000 times , the psnr of butteffly is 29.898 .
when I trained 360000 times , the psnr of butteffly is 29.917.
when I trained 450000 times , the psnr of butteffly is 29.714.
when I trained 520000 times , the psnr of butteffly is 29.978.
Is it weird???

lemony1314 · Answer 3 · Mon Mar 13 2017 22:31:04 GMT+0800 (China Standard Time)

@ @huangzehao Looking forward to your reply! Thank you !

Zehao Huang · Answer 4 · Tue Mar 14 2017 09:53:58 GMT+0800 (China Standard Time)

Hi, sorry for the late reply.
You should benchmark your model in full dataset, including Set5, Set14 and BSD100.
The psnr of single image sometimes can not represent the performance of your model.

XiangyuXu · Answer 5 · Sat Apr 01 2017 22:53:01 GMT+0800 (China Standard Time)

Hi, I have a question regarding the training. How many iterations do we need to train a reasonable VDSR model?
According to your discussion above, it seems at least 2*10^5 iterations are needed.
But it is stated in the paper that the training takes less than 4 hours which I think is not quite enough to finish more than 10^5 iterations.

Zehao Huang · Answer 6 · Sun Apr 02 2017 12:19:16 GMT+0800 (China Standard Time)

@xuxy09 Hi, check this #29

XiangyuXu · Answer 7 · Sun Apr 02 2017 12:27:55 GMT+0800 (China Standard Time)

Thanks. Seems like we have the same concern about the training time. And I agree with you that it is impossible to finish 80 epochs in 4 hours with one Titan Z, especially considering that Caffe is faster than MatConvnet generally. I think 24 hours should be a more reasonable answer.

WangChaofeng · Answer 8 · Sat Apr 15 2017 16:42:54 GMT+0800 (China Standard Time)

hello，@huangzehao I also have the same question about the test loss value。 In your Train log,the test loss converges to 0.4 after 15 epoch，it's ok test in Set5，But Set14 and BSD100。So I just want know How many epoch the Set14 and BSD100 datasets can get the same results in paper. Hope for your reply ,Thank you !

Zehao Huang · Answer 9 · Sun Apr 16 2017 12:29:25 GMT+0800 (China Standard Time)

@ChaofWang Hi, you can test the trained model in Set14 and BSD100. 20 or 30 epoch is enough.