OverLordGoldDragon / keras-adamw

Keras/TF implementation of AdamW, SGDW, NadamW, Warm Restarts, and Learning Rate multipliers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Invalid argument: Input to reshape is a tensor with 4300800 values, but the requested shape has 19268370432

papadako opened this issue · comments

Hi,

I am using keras-adamw with bert-for-tf2 under the AMD rocm environment, and sometimes I get an error like the following one:

File "bert-decept.py", line 543, in
history = fit_model(model, data, BATCH_SIZE, EPOCHS, tensorboard_callback, model_checkpoint_callback,
File "bert-decept.py", line 438, in fit_model
history = model.fit(
File "/home/papadako/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training_v1.py", line 766, in fit
return func.fit(
File "/home/papadako/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 649, in fit
return fit_loop(
File "/home/papadako/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 386, in model_iteration
batch_outs = f(ins_batch)
File "/home/papadako/.local/lib/python3.8/site-packages/tensorflow/python/keras/backend.py", line 3631, in call
fetched = self._callable_fn(*array_vals,
File "/home/papadako/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1470, in call
ret = tf_session.TF_SessionRunCallable(self._session._session,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 4300800 values, but the requested shape has 19268370432
[[{{node bert_1/encoder/layer_7/attention/self/query/Tensordot}}]]
[[Func/training_2/AdamW/gradients/gradients/bert_1/encoder/layer_7/output/dropout_62/cond_grad/StatelessIf/then/_11515/input/_23174/_9837]]
(1) Invalid argument: Input to reshape is a tensor with 4300800 values, but the requested shape has 19268370432
[[{{node bert_1/encoder/layer_7/attention/self/query/Tensordot}}]]
0 successful operations.
0 derived errors ignored.

or

File "bert-decept.py", line 543, in
history = fit_model(model, data, BATCH_SIZE, EPOCHS, tensorboard_callback, model_checkpoint_callback,
File "bert-decept.py", line 438, in fit_model
history = model.fit(
File "/home/papadako/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training_v1.py", line 766, in fit
return func.fit(
File "/home/papadako/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 649, in fit
return fit_loop(
File "/home/papadako/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 386, in model_iteration
batch_outs = f(ins_batch)
File "/home/papadako/.local/lib/python3.8/site-packages/tensorflow/python/keras/backend.py", line 3631, in call
fetched = self._callable_fn(*array_vals,
File "/home/papadako/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1470, in call
ret = tf_session.TF_SessionRunCallable(self._session._session,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Size 0 must be non-negative, not -1737945760
[[{{node bert/encoder/layer_5/attention/self/query/Tensordot/Reshape}}]]
[[Func/training/AdamW/gradients/gradients/bert/encoder/layer_9/output/dropout_30/cond_grad/StatelessIf/then/_696/input/_2295/_3389]]
(1) Invalid argument: Size 0 must be non-negative, not -1737945760
[[{{node bert/encoder/layer_5/attention/self/query/Tensordot/Reshape}}]]

At least in my non-experienced eyes it seems like an invalid pointer reference, so probably not a problem related to adamw but probably to rocm. Does anyone have any idea/nsight about what might be the problem?

Best regards
Panagiotis

I just got the error with another optimizer so I am closing this.