VirtualGL sometimes does not work
mnakao opened this issue · comments
When running vglrun glxinfo
remotely using vncserver, the following error occurs about once every few times. Do you know the cause of this?
$ __GLX_VENDOR_LIBRARY_NAME=nvidia VGL_DISPLAY=:1 vglrun glxinfo
X Error of failed request: BadAlloc (insufficient resources for operation)
Major opcode of failed request: 150 (GLX)
Minor opcode of failed request: 5 (X_GLXMakeCurrent)
Serial number of failed request: 0
Current serial number in output stream: 28
When it works normally, the output will be as shown below.
$ __GLX_VENDOR_LIBRARY_NAME=nvidia VGL_DISPLAY=:1 vglrun glxinfo | grep OpenGL | head -n 2
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: Tesla V100-PCIE-32GB/PCIe/SSE2
However, output of vglrun glxgears
is always Segmentation fault
.
My machine has Tesla V100 and its OS is Debian GNU/Linux 12.2. It has installed turbovnc_3.0.3_amd64.deb and irtualgl_3.1_amd64.deb.
$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06 Driver Version: 545.23.06 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-PCIE-32GB On | 00000000:65:00.0 Off | 0 |
| N/A 43C P0 26W / 250W | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Best,
The fact that you are overriding __GLX_VENDOR_LIBRARY_NAME
and VGL_DISPLAY
makes me suspect that you might not understand how VGL is supposed to work. Unless the GPU is attached to Display :1 (which it doesn't appear to be, based on the output of nvidia-smi
), then setting VGL_DISPLAY=:1
is incorrect, and there should never be any reason to override __GLX_VENDOR_LIBRARY_NAME
. If your TurboVNC session is listening on Display :1, which I assume is the case, then that combination of environment variables completely thwarts VirtualGL. The purpose of VGL is to redirect OpenGL rendering away from the X proxy (TurboVNC), because the X proxy lacks GPU acceleration. Setting VGL_DISPLAY
to the TurboVNC session's display forces VGL to use the TurboVNC Server's unaccelerated OpenGL implementation, which is based on Mesa. To make matters worse, you are also forcing the use of nVidia's GLX vendor library, which is incompatible with Mesa. Referring to the VirtualGL User's Guide, you need to either set up a "3D X server" to run on the GPU and point VGL_DISPLAY
to that 3D X server, or you need to point VGL_DISPLAY
to the EGL or DRI device corresponding to your GPU. Our documentation certainly never instructed you to override __GLX_VENDOR_LIBRARY_NAME
.
Thank you. I have confirmed that the VirtualGL works.