Not able to scrape metrics

Question

Not able to scrape metrics

suchisur opened this issue a year ago · comments

daemoneset has been set up on eks cluster. logs of the pod:
ts=2023-03-14T11:29:48.011Z caller=exporter.go:121 level=warn msg="Failed to auto-determine query field names, falling back to the built-in list" error="error running command: exit status 12: command failed. code: 12 | command: nvidia-smi --help-query-gpu | stdout: NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.\nPlease also try adding directory that contains libnvidia-ml.so to your system PATH.\n | stderr: NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.\nPlease also try adding directory that contains libnvidia-ml.so to your system PATH.\n"

Utku Özdemir · Answer 1 · Mon Jun 05 2023 18:33:09 GMT+0800 (China Standard Time)

Please see the volumeMounts section in the chart: https://github.com/utkuozdemir/helm-charts/blob/master/nvidia-gpu-exporter/values.yaml#L128-L133

You might want to check where these shared libraries are located in your host machines (nodes) and adjust the mounts accordingly, so that they are properly mounted inside. The default paths in the chart are where they are located on Ubuntu 20.04 server.