VirtualGL sometimes does not work

Question

VirtualGL sometimes does not work

mnakao opened this issue 8 months ago · comments

When running vglrun glxinfo remotely using vncserver, the following error occurs about once every few times. Do you know the cause of this?

$ __GLX_VENDOR_LIBRARY_NAME=nvidia VGL_DISPLAY=:1 vglrun glxinfo
X Error of failed request:  BadAlloc (insufficient resources for operation)
  Major opcode of failed request:  150 (GLX)
  Minor opcode of failed request:  5 (X_GLXMakeCurrent)
  Serial number of failed request:  0
  Current serial number in output stream:  28

When it works normally, the output will be as shown below.

$ __GLX_VENDOR_LIBRARY_NAME=nvidia VGL_DISPLAY=:1 vglrun glxinfo | grep OpenGL | head -n 2
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: Tesla V100-PCIE-32GB/PCIe/SSE2

However, output of vglrun glxgears is always Segmentation fault.

My machine has Tesla V100 and its OS is Debian GNU/Linux 12.2. It has installed turbovnc_3.0.3_amd64.deb and irtualgl_3.1_amd64.deb.

$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06              Driver Version: 545.23.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-PCIE-32GB           On  | 00000000:65:00.0 Off |                    0 |
| N/A   43C    P0              26W / 250W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Best,

DRC · Answer 1 · Sun Oct 22 2023 21:21:43 GMT+0800 (China Standard Time)

The fact that you are overriding __GLX_VENDOR_LIBRARY_NAME and VGL_DISPLAY makes me suspect that you might not understand how VGL is supposed to work. Unless the GPU is attached to Display :1 (which it doesn't appear to be, based on the output of nvidia-smi), then setting VGL_DISPLAY=:1 is incorrect, and there should never be any reason to override __GLX_VENDOR_LIBRARY_NAME. If your TurboVNC session is listening on Display :1, which I assume is the case, then that combination of environment variables completely thwarts VirtualGL. The purpose of VGL is to redirect OpenGL rendering away from the X proxy (TurboVNC), because the X proxy lacks GPU acceleration. Setting VGL_DISPLAY to the TurboVNC session's display forces VGL to use the TurboVNC Server's unaccelerated OpenGL implementation, which is based on Mesa. To make matters worse, you are also forcing the use of nVidia's GLX vendor library, which is incompatible with Mesa. Referring to the VirtualGL User's Guide, you need to either set up a "3D X server" to run on the GPU and point VGL_DISPLAY to that 3D X server, or you need to point VGL_DISPLAY to the EGL or DRI device corresponding to your GPU. Our documentation certainly never instructed you to override __GLX_VENDOR_LIBRARY_NAME.

Masahiro Nakao · Answer 2 · Thu Oct 26 2023 12:19:26 GMT+0800 (China Standard Time)

Thank you. I have confirmed that the VirtualGL works.