GVDB doesn't appear to render when CUDA device is not 0 (multi-GPU system)

Question

GVDB doesn't appear to render when CUDA device is not 0 (multi-GPU system)

digbeta opened this issue 4 years ago · comments

I have a multi-GPU system and I've always been using GVDB with device 0. Today I attempted to run some samples (3Dprint and spray deposit) using device 1, and when doing this, nothing appears to be rendered to the screen. I even shut down my workstation and swapped the cards (both RTX 2070 SUPER's), thus reversing the order the cards were enumerated in StartCuda (verified by checking the UUID of each card to ensure the order was reversed). I only get results rendered to the screen when using device 0.

In my project, when I split GVDB's across different GPU's, I can also see data being stored/saved to the second GPU... so it appears that both GPUs are being utilized for memory allocation, etc; there just doesn't appear to be valid data when calling ReadRenderTexGL().

Any ideas what might be happening?

Eric · Answer 1 · Sun Sep 27 2020 23:46:06 GMT+0800 (China Standard Time)

Also, nvidia-smi shows both GPUs and no errors are reported. However, I did determine that cuGLGetDevices() is only returning one GPU:

assert( cuGLGetDevices(&cudaGLDeviceCount, cudaGLDevices, 10, CU_GL_DEVICE_LIST_ALL /*CU_GL_DEVICE_LIST_NEXT_FRAME*/) == CUDA_SUCCESS );

Is this possibly related to WGL_NV_multigpu_context?

Eric · Answer 2 · Mon Sep 28 2020 07:22:42 GMT+0800 (China Standard Time)

I've determined that this appears to be related to an OpenGL setting in the NVIDIA control panel, "OpenGL rendering GPU". It looks like this setting requires a specific GPU to be selected for OpenGL. By toggling this setting, I can get it to work.

Neil Bickford · Answer 3 · Tue Sep 29 2020 03:55:01 GMT+0800 (China Standard Time)

Hi digbeta!

I think you've figured out what the issue is here - here's another take on it, in case it's useful!

ReadRenderTexGL uses CUDA-OpenGL interop to copy GVDB's rendered buffer, mRenderBuf, to an OpenGL texture. However, one of the requirements of CUDA-OpenGL interop is that the CUDA memory and the OpenGL texture have to be on the same GPU. CUDA provides more support than OpenGL for multi-GPU setups (by default, although it's possible to write code for multi-GPU systems using e.g. WGL_NV_gpu_affinity - there's a lot to say about mult-GPU programming here that I'm skimming over!), but the result is that the application might select a different GPU for GVDB computations than OpenGL selects for rendering, which causes this issue.

There are a couple of ways to solve this, in addition to using the NVIDIA Control Panel. When calling VolumeGVDB::SetCudaDevice/VolumeGVDB::StartCuda, using the GVDB_DEV_FIRST devid/devsel will use the GPU that will also be used for an OpenGL context.
Another way, if one wants to have a GVDB instance per GPU, is to handle copying the rendered buffer from the non-OpenGL GPU to the OpenGL GPU sort of manually using cuMemcpyPeer (for instance); then using the CUDA-OpenGL interop code from ReadRenderTexGL to copy this data to OpenGL without having to go through the CPU. (This is also a place where if the GPUs are the same, something like SLI or NVLink can help for performance: GPUs can transfer the data over the SLI or NVLink connection, instead of going over PCIe to the CPU, then over PCIe again to the other GPU.)
There are many other ways to distribute rendering and computation across multiple GPUs, of course - that's just two of them!

Eric · Answer 4 · Tue Sep 29 2020 03:59:42 GMT+0800 (China Standard Time)

Hi, Neil -

Thanks so much for the detailed answer! Yes, I am literally reading about all of this now to get up to speed and saw some of your related work on one of the other VR projects (the stereo rendering sample)... I do have NVlink, but it looks like the P2P functions don't seem to be supported on my setup, but that's a whole different issue. :)

Thanks so much for taking the time to respond! I'll close this out. Thanks again!

Neil Bickford · Answer 5 · Tue Sep 29 2020 04:02:35 GMT+0800 (China Standard Time)

No worries, glad it helps! Let me know if you have any additional questions. (The main author of the gl_multicast sample is Ingo Esser, but I'm happy to route questions!)

Eric · Answer 6 · Tue Sep 29 2020 04:04:41 GMT+0800 (China Standard Time)

Great, thank you!