utkuozdemir / nvidia_gpu_exporter

Nvidia GPU exporter for prometheus using nvidia-smi binary

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add support for the PCIe TX Throughput and RX Throughput metrics

sbates130272 opened this issue · comments

Is your feature request related to a problem? Please describe.

Since the Maxwell architecture the NVIDIA GPUs have contained hardware counters that track the traffic on both the incoming and outgoing PCIe link. Adding these counters to the fields exposed via the exporter can be very useful when monitoring these GPUs in a AI/ML fleet.

Describe the solution you'd like

Update the exporter code to support the addition of the TX Throughput and RX Throughput fields obtained via the nvidia-smi tool. We probably need to do this after a test on the GPU architecture to avoid errors on pre-Maxwell GPUs.

Describe alternatives you've considered

There are no other solutions that are as clean as this. I don't see anyone wanting to write a second exporter just for those metrics and adding more calls to nvidia-smi is probably not a wise move at the system level.

Additional context

The fields is questions are discussed in the nvidia-smi documenation. Once this issue is merged we could update the Grafana dashboard to include counters and guages for PCIe traffic.

Hi, thank you for the suggestion. Lately I don't find any time to maintain the project, and I don't think it's gonna change anytime soon. But a PR would be more than welcome, if you'd be interested.