NVlabs / tiny-cuda-nn

Lightning fast C++/CUDA neural network framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Backward method of the grid encoding

cppcph opened this issue · comments

I was trying to get the gradient of the loss w.r.t. the input of the encoder in the signed distance field scenario in instant-ngp. However, it seems that the implementation of the backward method of grid encoding in tiny-cuda-nn has issues if the argument of the dL_dinput is not a null pointer.

The following code snippet seems to be the cause of the Cuda memory problem:
https://github.com/NVlabs/tiny-cuda-nn/blob/235d1fde956dc04966940f9d1bec66aa3bdb705a/include/tiny-cuda-nn/encodings/grid.h#L893C3-L893C16

With the following message in the debug mode:
Thread 1 "instant-ngp" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
0x00007fffe3b71bd0 in void tcnn::kernel_grid_backward_input<__half, 3u>(unsigned int, unsigned int, __half const*, float const*, tcnn::MatrixView)<<<(2048,1,1),(128,1,1)>>> ()
cuda block (0, 0, 0) thread (96, 0, 0)
CUDA focus unchanged.
cuda block (0, 0, 0) thread (96, 0, 0)
[Switching focus to CUDA kernel 0, grid 44, block (0,0,0), thread (96,0,0), device 0, sm 0, warp 0, lane 0]
0x00007fffe3b71bd0 in void tcnn::kernel_grid_backward_input<__half, 3u>(unsigned int, unsigned int, __half const*, float const*, tcnn::MatrixView)<<<(2048,1,1),(128,1,1)>>> ()

I am using the cuda version 11.3 and nvidia-driver version 530.30.02. The graphics card is GeForce RTX 2060.