greatlog / DAN

This is an official implementation of Unfolding the Alternating Optimization for Blind Super Resolution

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem of loss ‘NAN’ value during training

YuqiangY opened this issue · comments

Thanks for your work so much.
Like the title, I meet the error that loss value is NAN at 22800 iterations (Second time: 29200) suddenly.
Did you have ever met this kind of error?

Yes, this situation occurs sometimes. A workaround method is to resume from the last normal training state.

Yes, this situation occurs sometimes. A workaround method is to resume from the last normal training state.

Thanks, it is how I solve this mistake.
However, do you have some clues about this error?

Sorry, I have not figured it out yet. If you have any ideas please tell me. Thank you.

Sorry, I have not figured it out yet. If you have any ideas please tell me. Thank you.

I couldn't find the key to the problem, too. In the last 20 hours, this error has occurred several times, when the num of iterations over 115000 especially. Is it common?

In my case, it occurs twice during 400000 iterations. The frequency seems to be random. Maybe it is an inherent drawback of the proposed method, as DAN is actually a recurrent neural network (RNN). Maybe I can borrow some ideas from RNNs to stabilize the training.

In my case, it occurs twice during 400000 iterations. The frequency seems to be random. Maybe it is an inherent drawback of the proposed method, as DAN is actually a recurrent neural network (RNN). Maybe I can borrow some ideas from RNNs to stabilize the training.

OK,thanks for your reply