utkuozdemir / nvidia_gpu_exporter

Nvidia GPU exporter for prometheus using nvidia-smi binary

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not able to scrape metrics

suchisur opened this issue · comments

daemoneset has been set up on eks cluster. logs of the pod:
ts=2023-03-14T11:29:48.011Z caller=exporter.go:121 level=warn msg="Failed to auto-determine query field names, falling back to the built-in list" error="error running command: exit status 12: command failed. code: 12 | command: nvidia-smi --help-query-gpu | stdout: NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.\nPlease also try adding directory that contains libnvidia-ml.so to your system PATH.\n | stderr: NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.\nPlease also try adding directory that contains libnvidia-ml.so to your system PATH.\n"

Please see the volumeMounts section in the chart: https://github.com/utkuozdemir/helm-charts/blob/master/nvidia-gpu-exporter/values.yaml#L128-L133

You might want to check where these shared libraries are located in your host machines (nodes) and adjust the mounts accordingly, so that they are properly mounted inside. The default paths in the chart are where they are located on Ubuntu 20.04 server.