WXinlong / ASIS

Associatively Segmenting Instances and Semantics in Point Clouds, CVPR 2019

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

running problem

lapetite123 opened this issue · comments

when I try to run the training code, there are some errors:
Current batch/total batch num: 0/697
2019-10-05 23:19:12.279100: E tensorflow/stream_executor/cuda/cuda_blas.cc:636] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "train.py", line 256, in
train()
File "train.py", line 211, in train
train_one_epoch(sess, ops, train_writer)
File "train.py", line 243, in train_one_epoch
feed_dict=feed_dict)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1128, in _run
feed_dict_tensor, options, run_metadata)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
options, run_metadata)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=786432, n=32, k=9
[[Node: layer1/conv0/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](layer1/concat, layer1/conv0/weights/read/_911)]]

Caused by op u'layer1/conv0/Conv2D', defined at:
File "train.py", line 256, in
train()
File "train.py", line 128, in train
pred_sem, pred_ins = get_model(pointclouds_pl, is_training_pl, NUM_CLASSES, bn_decay=bn_decay)
File "/home/ASIS/models/ASIS/model.py", line 29, in get_model
l1_xyz, l1_points, l1_indices = pointnet_sa_module(l0_xyz, l0_points, npoint=1024, radius=0.1, nsample=32, mlp=[32,32,64], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer1')
File "/home/ASIS/utils/pointnet_util.py", line 187, in pointnet_sa_module
data_format=data_format)
File "/home/ASIS/utils/tf_util.py", line 165, in conv2d
data_format=data_format)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 639, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1625, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InternalError (see above for traceback): Blas SGEMM launch failed : m=786432, n=32, k=9
[[Node: layer1/conv0/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](layer1/concat, layer1/conv0/weights/read/_911)]]

I don't know how to solve it, I really hope that you can help me! waiting for your reply,thanks!

Hi @lapetite123 , is this issue solved. It looks like your environment was not well prepared.

I also encountered the same problem, please ask you to solve it.