utkuozdemir / nvidia_gpu_exporter

Nvidia GPU exporter for prometheus using nvidia-smi binary

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NSSM missing metrics

jrowinski3d opened this issue · comments

Describe the bug
Missing metrics when nvidia_gpu_exporter is running as a service in Windows 10. Running nvidia_gpu_exporter manually, all metrics are exposed.

To Reproduce
Steps to reproduce the behavior:

  1. Follow Install docs
  2. Run the nssm service
  3. Connect to Prometheus, choose your target and 99% of the nvidia metrics are missing

Expected behavior
Have all metrics exposed running it as a service

Console output
I see in the metrics that scraping nvidia_smi is failing, just cannot determine why..

# HELP nvidia_gpu_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, goversion from which nvidia_gpu_exporter was built, and the goos and goarch for the build.
# TYPE nvidia_gpu_exporter_build_info gauge
nvidia_gpu_exporter_build_info{branch="HEAD",goarch="amd64",goos="windows",goversion="go1.20",revision="01f163635ca74aefcfb62cab4dc0d25cc26c0562",version="1.2.0"} 1
# HELP nvidia_smi_command_exit_code Exit code of the last scrape command
# TYPE nvidia_smi_command_exit_code gauge
nvidia_smi_command_exit_code -1
# HELP nvidia_smi_failed_scrapes_total Number of failed scrapes
# TYPE nvidia_smi_failed_scrapes_total counter
nvidia_smi_failed_scrapes_total 2

Model and Version

  • GPU Model [`NVIDIA RTXA6000-48Q``]
  • App version and architecture [amd64 ]
  • Installation method [scoop]
  • Operating System [Windows 10]
  • Nvidia GPU driver version [Production Driver 513.46]

Additional context
Running nvidia_gpu_exporter manually from powershell, all the metrics work fine. I am looking to see if anyone else has this issue or if I am doing something wrong here...

I resolved my issue, I believe it has to do with the AppDirectory, but for anyone else that might have this issue, I did the following:

PS C:\Windows\system32> nssm set nvidia_gpu_exporter AppDirectory "C:\ProgramData\scoop\apps\nvidia_gpu_exporter\current"
Set parameter "AppDirectory" for service "nvidia_gpu_exporter".
PS C:\Windows\system32> nssm set nvidia_gpu_exporter AppStdout "C:\ProgramData\scoop\apps\nvidia_gpu_exporter\current\stdout.log"
Set parameter "AppStdout" for service "nvidia_gpu_exporter".
PS C:\Windows\system32> nssm set nvidia_gpu_exporter AppStderr "C:\ProgramData\scoop\apps\nvidia_gpu_exporter\current\stderr.log"
Set parameter "AppStderr" for service "nvidia_gpu_exporter".

This allowed me to write log files and determine any stdout/stderr coming from the process. Originally I had set command directly to customize the nvidia-smi.exe but it was not parsing the windows folder structure.