ros-industrial / yak

A library for integrating depth images into Truncated Signed Distance Fields.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Assertion failed if volume_x dimension is not a multiple of 32

dave992 opened this issue · comments

For our application, I want to place the tdsf volume above a table of 1200x2000 mm. However, an assertion failed when I used volume_x = 2000. Upon closer inspection, this occurs if volume_x is not a multiple of 32 (kinfu.cpp:52). Is there a reason this assertion is in place?

The complete error message:
OpenCV Error: Assertion failed (params.volume_dims[0] % 32 == 0) in KinFu, file /home/dave992/workspaces/teriyaki_ws/src/yak/yak/src/kfusion/kinfu.cpp, line 52 terminate called after throwing an instance of 'cv::Exception'

As I understand it this is rooted in the way CUDA handles parallelization. CUDA executes threads in groups of 32 (aka "warps"), and the assertion exists to make sure that the overall number of voxels in the volume can always be evenly split among warps without any remainder.

I haven't tried simply removing that assertion, so I'm unclear on whether violating this constraint would result in inefficient parallelization or some sort of catastrophic failure.

From what I remember from my (short) CUDA introduction, it is indeed a good practice to define blocks using a multiple of 32 threads. As threads come in groups of 32 this allows using all threads of an SM.

The assertion seems to imply the division in blocks assumes it is dividable by 32 without a remainder. This should not be needed and can be solved by rounding the required number of blocks up and adding a thread check to the kernel. This still allows the SMs to be fully utilized except for the last block which handles only the left-over calculations.

I am currently checking the kernels to make sure there are thread checks in place. If that is the case it should be possible to remove the assertion completely.

I checked all CUDA kernels in tsdf_volume.cu and there was only one kernel where I am not sure:

#if __CUDA_ARCH__ >= 120
if (__all_sync(0xFFFFFFFF, x >= volume.dims.x) || __all_sync(0xFFFFFFFF, y >= volume.dims.y))
return;
#else
if (Emulation::All(x >= volume.dims.x, cta_buffer) || Emulation::All(y >= volume.dims.y, cta_buffer))
return;
#endif

That looks like a thread check but I am not familiar with the specific implementation there. All other kernels in tsdf_volume.cu do a (regular) thread check.

The other CUDA files are not involved in processing the tsdf volume right? I did had a look at the files and found that only proj_icp.cu does not do thread check, but this does not seem to involve the volume dimensions.

Everything seems to work if I remove the assertion from the code and run the demo. I did this for volume_x = 640 and volume_x = 641. Calling the GenerateMesh service produces a .ply of the bunny as expected.

The other CUDA files are not involved in processing the tsdf volume right? I did had a look at the files and found that only proj_icp.cu does not do thread check, but this does not seem to involve the volume dimensions.

If I recall correctly during projective ICP the new depth image is compared to the previous depth image, so it makes sense that it doesn't need to know very much info about the volume.

Interesting to know that this assertion is not actually needed. This should simplify the process of initializing a new volume from user-specified parameters.

Alright, I made a PR for this then.

Closing since #34 has been merged. Thank you very much for looking into this and submitting the PR!