Loss can start as NaN

Question

Loss can start as NaN

JoaoLages opened this issue 7 years ago · comments

João Lages commented 7 years ago

Any solution for this or any idea why (should be a division by zero somewhere)? Happens sometimes

Sandeep Tammu · Answer 1 · Tue Oct 03 2017 13:14:09 GMT+0800 (China Standard Time)

Can you give me some more context, is it starting as NaN or is it converging to NaN?

João Lages · Answer 2 · Tue Oct 03 2017 17:49:46 GMT+0800 (China Standard Time)

It is starting with NaN, then it cannot converge anymore. If it starts with a anything else other than NaN, I never saw it converging to NaN

Sandeep Tammu · Answer 3 · Tue Oct 03 2017 18:54:29 GMT+0800 (China Standard Time)

This problem didn't occur to me when I tested it, which data set were you using?

João Lages · Answer 4 · Tue Oct 03 2017 18:57:56 GMT+0800 (China Standard Time)

I have been using another dataset, it's true, which I cannot share unfortunately. I was wondering if you had any idea on why it could happen and how to avoid it though

JasonMeng · Answer 5 · Sun Jan 13 2019 10:54:19 GMT+0800 (China Standard Time)

@JoaoLages
I encountered the same NaN problem in some parameter settings. (usually happens when the hidden dimension is small.) After debugging, I found it is because two parameters (bias_word and bias_sent)are not initialized which may contain NaN.
Add self.bias_word.data.uniform_(-0.1,0.1) to init() of AttentionWordRNN.
Add self.bias_sent.data.uniform_(-0.1,0.1) to init() of AttentionSentRNN.

It solved my problem. Hope this can help yours!