utkuozdemir / nvidia_gpu_exporter

Nvidia GPU exporter for prometheus using nvidia-smi binary

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nvidia_smi_gpu_info is in reverse order (by uuid) than other metrics

Puupuls opened this issue · comments

commented

Describe the bug
nvidia_smi_gpu_info is in reverse order (by UUID) than other metrics
image
which makes machine-wide dashboards malfunction and have metrics flipped:
image

To Reproduce
Steps to reproduce the behavior:
Have 2 different GPUs in the system? Not sure, currently I have access to only one such machine. On servers with identical cards this does not happen

Expected behavior
All metrics should be sorted by UUID so that they align

Console output
N/A

Model and Version

  • GPU Model: NVIDIA GeForce GTX 1080 Ti and NVIDIA GeForce RTX 3060
  • App version and architecture: nvidia_gpu_exporter, version 1.1.0 (branch: HEAD, revision: 086b41f286814c3d1b0eb93141664ff8932eb0c8)
  • Installation method: binary download
  • Operating System Ubuntu Server 22.04 uname -a: Linux ubuntu 5.15.0-47-generic #51-Ubuntu SMP Thu Aug 11 07:51:15 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Nvidia GPU driver version: nvidia-driver-515-server 515.65.01

Additional context:
image

Hi,

I don't think this is a issue. The output of the metrics endpoint is prometheus compatible, and in fact, generated by the prometheus' official library. I don't think order matters from the prometheus perspective. The reverse order is probably (just a guess) caused by the whole lines being sorted alphabetically - because that metric (nvidia_smi_gpu_info) has other labels, they affect the alphabetical order.

The issue on the dashboard is not caused by the order, but because it is not designed to display multiple cards' metrics at once. You should be able to choose a single GPU from the top dropdown and it should render correctly.