command failed. stderr: err: exit status 12
lex-em opened this issue · comments
Describe the bug
Error command failed. stderr: err: exit status 12
when running in docker.
To Reproduce
docker-compose.yml
version: "3"
services:
nvidia_smi_exporter:
image: utkuozdemir/nvidia_gpu_exporter:0.4.0
devices:
- /dev/nvidiactl:/dev/nvidiactl
- /dev/nvidia0:/dev/nvidia0
volumes:
- /usr/lib/libnvidia-ml.so:/usr/lib/x86_64-linux-gnu/libnvidia-ml.so
- /usr/lib/libnvidia-ml.so.1:/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
- /usr/bin/nvidia-smi:/usr/bin/nvidia-smi
ports:
- 9835:9835
Console output
docker-compose service console output
ts=2022-03-05T12:14:57.407Z caller=exporter.go:108 level=warn msg="Failed to auto-determine query field names, falling back to the built-in list"
2022-03-05T12:14:57.408274606Z ts=2022-03-05T12:14:57.408Z caller=main.go:66 level=info msg="Listening on address" address=:9835
2022-03-05T12:14:57.408511827Z ts=2022-03-05T12:14:57.408Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=false
2022-03-05T12:15:01.058200295Z ts=2022-03-05T12:15:01.058Z caller=exporter.go:157 level=error error="command failed. stderr: err: exit status 12"
2022-03-05T12:19:49.798104217Z ts=2022-03-05T12:19:49.797Z caller=exporter.go:157 level=error error="command failed. stderr: err: exit status 12"
2022-03-05T12:20:14.187971066Z ts=2022-03-05T12:20:14.187Z caller=exporter.go:157 level=error error="command failed. stderr: err: exit status 12"
2022-03-05T12:20:44.187095757Z ts=2022-03-05T12:20:44.186Z caller=exporter.go:157 level=error error="command failed. stderr: err: exit status 12"
2022-03-05T12:21:14.187231908Z ts=2022-03-05T12:21:14.187Z caller=exporter.go:157 level=error error="command failed. stderr: err: exit status 12"
2022-03-05T12:21:44.187147375Z ts=2022-03-05T12:21:44.187Z caller=exporter.go:157 level=error error="command failed. stderr: err: exit status 12"
2022-03-05T12:22:14.186874585Z ts=2022-03-05T12:22:14.186Z caller=exporter.go:157 level=error error="command failed. stderr: err: exit status 12"
2022-03-05T12:22:44.186995854Z ts=2022-03-05T12:22:44.186Z caller=exporter.go:157 level=error error="command failed. stderr: err: exit status 12"
2022-03-05T12:23:14.188342901Z ts=2022-03-05T12:23:14.187Z caller=exporter.go:157 level=error error="command failed. stderr: err: exit status 12"
Model and Version
OS: Fedora Linux 35
Qt version: 5.15.2
Kernel version: 5.16.11-200.fc35.x86_64
CPU: i7-11800H
GPU: NVIDIA GeForce RTX 3050 Ti Laptop GPU/PCIe/SSE2
NVIDIA Driver Version: 510.47.03
NVML Version: 11.510.47.03
$ ll /dev | grep nvidia
crw-rw-rw-. 1 root root 195, 0 Mar 5 16:14 nvidia0
crw-rw-rw-. 1 root root 195, 255 Mar 5 16:14 nvidiactl
crw-rw-rw-. 1 root root 195, 254 Mar 5 16:14 nvidia-modeset
crw-rw-rw-. 1 root root 505, 0 Mar 5 16:14 nvidia-uvm
crw-rw-rw-. 1 root root 505, 1 Mar 5 16:14 nvidia-uvm-tools
$ ll /usr/lib | grep nvidia
lrwxrwxrwx. 1 root root 26 Feb 1 20:33 libEGL_nvidia.so.0 -> libEGL_nvidia.so.510.47.03
-rwxr-xr-x. 1 root root 1224012 Jan 25 03:35 libEGL_nvidia.so.510.47.03
lrwxrwxrwx. 1 root root 32 Feb 1 20:33 libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.510.47.03
-rwxr-xr-x. 1 root root 71120 Jan 25 03:34 libGLESv1_CM_nvidia.so.510.47.03
lrwxrwxrwx. 1 root root 29 Feb 1 20:33 libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.510.47.03
-rwxr-xr-x. 1 root root 128464 Jan 25 03:34 libGLESv2_nvidia.so.510.47.03
lrwxrwxrwx. 1 root root 26 Feb 1 20:33 libGLX_nvidia.so.0 -> libGLX_nvidia.so.510.47.03
-rwxr-xr-x. 1 root root 1082980 Jan 25 03:34 libGLX_nvidia.so.510.47.03
lrwxrwxrwx. 1 root root 32 Feb 1 20:33 libnvidia-allocator.so.1 -> libnvidia-allocator.so.510.47.03
-rwxr-xr-x. 1 root root 121408 Jan 25 03:34 libnvidia-allocator.so.510.47.03
-rwxr-xr-x. 1 root root 59574832 Jan 25 03:56 libnvidia-compiler.so.510.47.03
-rwxr-xr-x. 1 root root 28190356 Jan 25 03:48 libnvidia-eglcore.so.510.47.03
lrwxrwxrwx. 1 root root 29 Feb 1 20:33 libnvidia-encode.so.1 -> libnvidia-encode.so.510.47.03
-rwxr-xr-x. 1 root root 124048 Jan 25 03:34 libnvidia-encode.so.510.47.03
lrwxrwxrwx. 1 root root 26 Feb 1 20:33 libnvidia-fbc.so.1 -> libnvidia-fbc.so.510.47.03
-rwxr-xr-x. 1 root root 136828 Jan 25 03:34 libnvidia-fbc.so.510.47.03
-rwxr-xr-x. 1 root root 30472084 Jan 25 03:49 libnvidia-glcore.so.510.47.03
-rwxr-xr-x. 1 root root 613928 Jan 25 03:35 libnvidia-glsi.so.510.47.03
-rwxr-xr-x. 1 root root 18955008 Jan 25 03:53 libnvidia-glvkspirv.so.510.47.03
lrwxrwxrwx. 1 root root 25 Feb 1 20:33 libnvidia-ml.so -> libnvidia-ml.so.510.47.03
lrwxrwxrwx. 1 root root 25 Feb 1 20:33 libnvidia-ml.so.1 -> libnvidia-ml.so.510.47.03
-rwxr-xr-x. 1 root root 1702708 Jan 25 03:36 libnvidia-ml.so.510.47.03
lrwxrwxrwx. 1 root root 29 Feb 1 20:33 libnvidia-opencl.so.1 -> libnvidia-opencl.so.510.47.03
-rwxr-xr-x. 1 root root 17126348 Jan 25 03:56 libnvidia-opencl.so.510.47.03
lrwxrwxrwx. 1 root root 34 Feb 1 20:33 libnvidia-opticalflow.so.1 -> libnvidia-opticalflow.so.510.47.03
-rwxr-xr-x. 1 root root 46224 Jan 25 03:33 libnvidia-opticalflow.so.510.47.03
lrwxrwxrwx. 1 root root 37 Feb 1 20:33 libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.510.47.03
-rwxr-xr-x. 1 root root 12802792 Jan 25 03:40 libnvidia-ptxjitcompiler.so.510.47.03
-rwxr-xr-x. 1 root root 13560 Jan 25 03:33 libnvidia-tls.so.510.47.03
drwxr-xr-x. 2 root root 4096 Feb 6 18:06 nvidia
$ ll /usr/bin | grep nvidia
-rwxr-xr-x. 1 root root 36981 Jan 25 04:57 nvidia-bug-report.sh
-rwxr-xr-x. 1 root root 47528 Feb 14 17:03 nvidia-container-cli
-rwxr-xr-x. 1 root root 2260408 Feb 14 17:04 nvidia-container-runtime
lrwxrwxrwx. 1 root root 33 Feb 17 08:09 nvidia-container-runtime-hook -> /usr/bin/nvidia-container-toolkit
-rwxr-xr-x. 1 root root 2156344 Feb 14 17:04 nvidia-container-toolkit
-rwxr-xr-x. 1 root root 49920 Jan 25 04:09 nvidia-cuda-mps-control
-rwxr-xr-x. 1 root root 14488 Jan 25 04:09 nvidia-cuda-mps-server
-rwxr-xr-x. 1 root root 260912 Jan 25 03:48 nvidia-debugdump
-rwxr-xr-x. 1 root root 721 Feb 14 17:05 nvidia-docker
-rwxr-xr-x. 1 root root 3896400 Jan 25 03:49 nvidia-ngx-updater
-rwxr-xr-x. 1 root root 45272 Feb 2 01:13 nvidia-persistenced
-rwxr-xr-x. 1 root root 978560 Jan 25 03:49 nvidia-powerd
-rwxr-xr-x. 1 root root 323128 Feb 2 01:29 nvidia-settings
-rwxr-xr-x. 1 root root 904 Jan 25 03:45 nvidia-sleep.sh
-rwxr-xr-x. 1 root root 690808 Jan 25 03:49 nvidia-smi
Hi,
I have never tried this on Fedora Linux, tried only on Ubuntu. Can you please run the following command on the host system (not inside a container) and see if it works:
nvidia-smi --query-gpu="timestamp,driver_version" --format=csv
If it's not working, it's an issue with the driver installation on the host.
If it is working on the host, then you need to experiment a bit - find the correct locations of libnvidia-ml.so
, libnvidia-ml.so.1
, nvidia-smi
in your Fedora system and so on and mount them into the container. Then you need to run the same command manually but inside the container this time with different mount configurations etc. until you get it working.
If you find a working config for Fedora+Docker, please share it here so I can add it to the documentation.
Also, looking at your error code (12
), this might be helpful: influxdata/telegraf#4388
Please see the solutions suggested on this ticket - they might help with your problem.
I foud a mistake, wrong source libraries, right ones:
volumes:
- /usr/lib64/libnvidia-ml.so:/usr/lib64/libnvidia-ml.so:ro
- /usr/lib64/libnvidia-ml.so.1:/usr/lib64/libnvidia-ml.so.1:ro