InvalidArgumentError when running on cluster
kakawait opened this issue · comments
Demo is successfully working when using local but when I try to execute remotely (to take advantage of GPU)
tf.Session("grpc://HOSTNAME:2222")
I have the following error when running mnist_2.0_five_layers_sigmoid.py
Caused by op 'Variable_1/Assign', defined at:
File "mnist_2.0_five_layers_sigmoid.py", line 51, in <module>
B1 = tf.Variable(tf.zeros([L]))
File "/Volumes/Users/<USERNAME>/.tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 226, in __init__
expected_shape=expected_shape)
File "/Volumes/Users/<USERNAME>/.tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 334, in _init_from_args
validate_shape=validate_shape).op
File "/Volumes/Users/<USERNAME>/.tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 47, in assign
use_locking=use_locking, name=name)
File "/Volumes/Users/<USERNAME>/.tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/Volumes/Users/<USERNAME>/.tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Volumes/Users/<USERNAME>/.tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [10] rhs shape= [200]
[[Node: Variable_1/Assign = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:worker/replica:0/task:0/gpu:0"](Variable_1, zeros)]]
Update I fixed InvalidArgumentError
from mnist_1.0_softmax.py
by upgrading server python version tensorflow/tensorflow:latest-gpu
-> tensorflow/tensorflow:latest-gpu-py3
Hum never mind is about cluster. When I restart server everything is OK. Just it seems that cluster does not really work well with multiple training scripts...