EdGENetworks / attention-networks-for-classification

Hierarchical Attention Networks for Document Classification in PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Loss can start as NaN

JoaoLages opened this issue · comments

Any solution for this or any idea why (should be a division by zero somewhere)? Happens sometimes

Can you give me some more context, is it starting as NaN or is it converging to NaN?

It is starting with NaN, then it cannot converge anymore. If it starts with a anything else other than NaN, I never saw it converging to NaN

This problem didn't occur to me when I tested it, which data set were you using?

I have been using another dataset, it's true, which I cannot share unfortunately. I was wondering if you had any idea on why it could happen and how to avoid it though

@JoaoLages
I encountered the same NaN problem in some parameter settings. (usually happens when the hidden dimension is small.) After debugging, I found it is because two parameters (bias_word and bias_sent)are not initialized which may contain NaN.
Add self.bias_word.data.uniform_(-0.1,0.1) to init() of AttentionWordRNN.
Add self.bias_sent.data.uniform_(-0.1,0.1) to init() of AttentionSentRNN.

It solved my problem. Hope this can help yours!