NaN or Inf detected when computing log_softmax

Question

NaN or Inf detected when computing log_softmax

eahogue opened this issue 3 years ago · comments

Immediately when I start training a model I get "NaN of Inf detected" when this line happens:

logloss = log_softmax(f_i, valid_frames)

Note this is with immediate_compute and check_validity turned on. If they aren't, then the error seems to happen a little later in the process.

In the most recent run, the values being passed to log_softmax are:

f_i = expression 1630/2
valid_frames = [204, 28]

Can someone help me understand why this input is returning either inf or nan? I've looked through the issues and it seems to be something different each time.

Here is an example of what log_softmax returns (not the same run as above though):

logloss = expression 3429/2

Thanks!

Elizabeth Nielsen · Answer 1 · Sat Mar 13 2021 20:25:15 GMT+0800 (China Standard Time)

I ran into a similar problem when using immediate_compute and check_validity. It seems like I can fix it by simply removing the restr argument when I call log_softmax(). The documentation for log_softmax() says that "All elements not included in restriction are set to negative infinity." I suspect that when you have immediate_compute and/or check_validity on, it might be catching these -inf values that were put there by the restr argument and then flagging these as problems. If this is actually what's going on, then I think this is a bug.

Like you, even with the check_validity mode off, I am still getting nan errors later on -- I suspect those are coming from another source, which I have yet to pin down.