RuntimeWarning: invalid value encountered in less

Question

RuntimeWarning: invalid value encountered in less

rafaleo opened this issue 4 years ago · comments

I'm training attention model on different data. I've encountered some strange error after several epochs of running:

Using TensorFlow backend.
num samples: 29024
input seq: 29024
Found 5000 unique input tokens.
target seq: 29024 | inp: 29024
Found 5000 unique output tokens.
encoder_data.shape: (29024, 11)
encoder_data[0]: [ 0  0  0  0  0  0  0  0  0  0 43]
decoder_data[0]: [  3 266   1   0   0   0   0   0   0   0   0   0   0   0]
decoder_data.shape: (29024, 14)
Loading word vectors...
Found 400000 word vectors.
Filling pre-trained embeddings...
OUTPUT size: (29024, 14, 5001)
2020-03-16 09:37:03.339535: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default
inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
C:\Users\cp\Anaconda3\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:433: UserWarning: Converting
sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Train on 23219 samples, validate on 5805 samples
Epoch 1/50
23219/23219 [==============================] - 1056s 45ms/step - loss: 1.6740 - acc: 0.4022 - val_loss: 2.0796 - val_acc
: 0.3890

Epoch 00001: val_loss improved from inf to 2.07957, saving model to ./large_files/weights/engpol-30k-epoch.01-loss.2.08.
hdf5
Epoch 2/50
23219/23219 [==============================] - 1019s 44ms/step - loss: 1.2243 - acc: 0.5152 - val_loss: 1.8456 - val_acc
: 0.4375

Epoch 00002: val_loss improved from 2.07957 to 1.84557, saving model to ./large_files/weights/engpol-30k-epoch.02-loss.1
.85.hdf5
Epoch 3/50
23219/23219 [==============================] - 1051s 45ms/step - loss: 0.9595 - acc: 0.5739 - val_loss: 1.7147 - val_acc
: 0.4640

Epoch 00003: val_loss improved from 1.84557 to 1.71466, saving model to ./large_files/weights/engpol-30k-epoch.03-loss.1
.71.hdf5
Epoch 4/50
23219/23219 [==============================] - 1099s 47ms/step - loss: 0.7664 - acc: 0.6238 - val_loss: 1.6391 - val_acc
: 0.4823

Epoch 00004: val_loss improved from 1.71466 to 1.63908, saving model to ./large_files/weights/engpol-30k-epoch.04-loss.1
.64.hdf5
Epoch 5/50
23219/23219 [==============================] - 1021s 44ms/step - loss: 0.6217 - acc: 0.6725 - val_loss: 1.6114 - val_acc
: 0.4919

Epoch 00005: val_loss improved from 1.63908 to 1.61137, saving model to ./large_files/weights/engpol-30k-epoch.05-loss.1
.61.hdf5
Epoch 6/50
23219/23219 [==============================] - 1021s 44ms/step - loss: 0.5111 - acc: 0.7154 - val_loss: 1.6024 - val_acc
: 0.5002

Epoch 00006: val_loss improved from 1.61137 to 1.60242, saving model to ./large_files/weights/engpol-30k-epoch.06-loss.1
.60.hdf5
Epoch 7/50
23219/23219 [==============================] - 1034s 45ms/step - loss: nan - acc: 0.4895 - val_loss: nan - val_acc: 0.00
00e+00
C:\Users\cp\Anaconda3\lib\site-packages\keras\callbacks\callbacks.py:709: RuntimeWarning: invalid value encountered in l
ess
  if self.monitor_op(current, self.best):

Epoch 00007: val_loss did not improve from 1.60242
Epoch 8/50
 9796/23219 [===========>..................] - ETA: 11:31 - loss: nan - acc: 0.0000e+00Traceback (most recent call last)

I've checked input data, it seems ok. No missing values. What could cause the issue during train? How can I monitor what went wrong? is it possible than some value goes to infinity (in the current format of matrix data)? The problem occurs always when validation loss is close to converge (apparently).

LazyProgrammer.me · Answer 1 · Thu Feb 17 2022 15:19:47 GMT+0800 (China Standard Time)

Please use the course Q&A for course-related questions