Blas GEMM launch failed
mtli77 opened this issue · comments
Hi @HuguesTHOMAS
During training, an error was reported here:
InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(2054, 256), b.shape=(2054, 128), m=256, n=128, k=2054
[[node optimizer/gradients/KernelPointNetwork/layer_2/resnetb_deformable_0/conv1/MatMul_grad/MatMul_1 (defined at /disk/tia/KPConv/utils/trainer_ss.py:141) = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](KernelPointNetwork/layer_1/resnetb_strided_1/LeakyRelu, optimizer/gradients/AddN_110)]]
I use the Ubuntu 16.04; CUDA 9.2; cudnn 7.6.5; NVIDIA TITAN RTX GPU
And with anaconda environment: cudatoolkit 9.0; cudnn 7.6.5; tensorflow-gpu 1.12.3
This error has been occur several times, as well as the below:
2020-11-10 21:13:16.706760: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2020-11-10 21:13:16.706819: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:274] Unexpected Event status: 1
Could you please help me solve this problem?
Best regards!
This is strange it seems to be an error in the network model. I have not updated this repo for a long time, but I have never seen this error... Di you change some things in the models or networks blocks definition (files in the models
folder)?
This is strange it seems to be an error in the network model. I have not updated this repo for a long time, but I have never seen this error... Di you change some things in the models or networks blocks definition (files in the
models
folder)?
Hi, @HuguesTHOMAS
I did not change anything about the files in the models
folder.
and the same error was found in the project based on Kpconv, such as mprm and D3feat
2020-12-17 14:24:21.381140: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered