HuguesTHOMAS / KPConv

Kernel Point Convolutions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Blas GEMM launch failed

mtli77 opened this issue · comments

Hi @HuguesTHOMAS
During training, an error was reported here:

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(2054, 256), b.shape=(2054, 128), m=256, n=128, k=2054
         [[node optimizer/gradients/KernelPointNetwork/layer_2/resnetb_deformable_0/conv1/MatMul_grad/MatMul_1 (defined at /disk/tia/KPConv/utils/trainer_ss.py:141)  = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](KernelPointNetwork/layer_1/resnetb_strided_1/LeakyRelu, optimizer/gradients/AddN_110)]]

I use the Ubuntu 16.04; CUDA 9.2; cudnn 7.6.5; NVIDIA TITAN RTX GPU
And with anaconda environment: cudatoolkit 9.0; cudnn 7.6.5; tensorflow-gpu 1.12.3

This error has been occur several times, as well as the below:

2020-11-10 21:13:16.706760: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2020-11-10 21:13:16.706819: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:274] Unexpected Event status: 1

Could you please help me solve this problem?

Best regards!

This is strange it seems to be an error in the network model. I have not updated this repo for a long time, but I have never seen this error... Di you change some things in the models or networks blocks definition (files in the models folder)?

This is strange it seems to be an error in the network model. I have not updated this repo for a long time, but I have never seen this error... Di you change some things in the models or networks blocks definition (files in the models folder)?

Hi, @HuguesTHOMAS
I did not change anything about the files in the models folder.
and the same error was found in the project based on Kpconv, such as mprm and D3feat

2020-12-17 14:24:21.381140: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

Hi, @mtli77

I meet the same problem now. Have you solved it?