Getting error while running in Ubuntu with 2 GPU

Question

Getting error while running in Ubuntu with 2 GPU

nidhinkrishnanv opened this issue 3 years ago · comments

I am getting error while trying to run the code in Ubuntu with 2 GPUs. Is this related to configuration?

  File "./run_dnn.py", line 911, in <module>
    train(wnd_conf, args['model_ckpt'])
  File "./run_dnn.py", line 154, in train
    tower_train_logits = inf.inference(tower_batch_features, is_train=True)
  File "/experiments/CIKM2020_DMT/DMT_code/model/inference_mlp.py", line 118, in inference
    return self.model.inference(inputs,is_train,is_predict)
  File "/experiments/CIKM2020_DMT/DMT_code/model/net/mmoe_transformer_unbias.py", line 294, in inference
    features = self.embedding_trans(inputs, is_train=is_train)
  File "/experiments/CIKM2020_DMT/DMT_code/model/net/mmoe_transformer_unbias.py", line 229, in embedding_trans
    self.interest_state = self.trans_core(self.seq_data, is_train=is_train)
  File "/experiments/CIKM2020_DMT/DMT_code/model/net/mmoe_transformer_unbias.py", line 207, in trans_core
    user_stat = m.encode_decode(input, name="encode_decode_" + stag, training=is_train)
  File "/experiments/CIKM2020_DMT/DMT_code/model/net/TransformerModel.py", line 56, in encode_decode
    state_encode, state_lens =self.encode((seq_k, seq_k_lens, seq_k_ts), name, training=training)
  File "/experiments/CIKM2020_DMT/DMT_code/model/net/TransformerModel.py", line 118, in encode
    scope="self-attention"
  File "/experiments/CIKM2020_DMT/DMT_code/model/net/TransformerModel_util.py", line 188, in multihead_attention
    Q = tf.layers.dense(queries, d_model, use_bias=True)  # (N, T_q, d_model)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/core.py", line 184, in dense
    return layer.apply(inputs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 817, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 374, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/layers/core.py", line 963, in call
    outputs = standard_ops.tensordot(inputs, self.kernel, [[rank - 1], [0]])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 2982, in tensordot
    a_reshape, a_free_dims, a_free_dims_static = _tensordot_reshape(a, a_axes)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 2931, in _tensordot_reshape
    free_dims = array_ops.gather(shape_a, free)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 2675, in gather
    return gen_array_ops.gather_v2(params, indices, axis, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3332, in gather_v2
    "GatherV2", params=params, indices=indices, axis=axis, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Cannot assign a device for operation DnnModel/embedding_trans/trans_sequence_0/encode_decode_sequence_0/encode_decode_sequence_0/num_blocks_0/self-attention/dense/Tensordot/GatherV2: Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Registered kernels:
  device='XLA_CPU'; Taxis in [DT_INT32, DT_INT64]; Tindices in [DT_INT32, DT_INT64]; Tparams in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_HALF, DT_UINT32, DT_UINT64]
  device='XLA_GPU'; Taxis in [DT_INT32, DT_INT64]; Tindices in [DT_INT32, DT_INT64]; Tparams in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT8, ..., DT_QINT32, DT_BFLOAT16, DT_HALF, DT_UINT32, DT_UINT64]
  device='XLA_CPU_JIT'; Taxis in [DT_INT32, DT_INT64]; Tindices in [DT_INT32, DT_INT64]; Tparams in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_HALF, DT_UINT32, DT_UINT64]
  device='XLA_GPU_JIT'; Taxis in [DT_INT32, DT_INT64]; Tindices in [DT_INT32, DT_INT64]; Tparams in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT8, ..., DT_QINT32, DT_BFLOAT16, DT_HALF, DT_UINT32, DT_UINT64]
  device='CPU'; Tparams in [DT_UINT64]; Tindices in [DT_INT64]```

Eric Yulong Gu · Answer 1 · Thu Jan 28 2021 16:36:17 GMT+0800 (China Standard Time)

The code was tested on Linux machine with several GPUs (NVIDIA Tesla P40). The tensorflow version is 1.12. I have left the company, and cannot provide the detail requirements information in the machine. Thanks!