bkbilly / lnxlink

🖥 Effortlessly manage your Linux machine using MQTT.

Home Page:https://bkbilly.gitbook.io/lnxlink

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use nvidia-settings for GPU load if nvsmi load is NaN

PW999 opened this issue · comments

Describe the solution you'd like

I'm running a GTX770 using the 470 drivers on Manjaro. For some reasons, nvidia-smi doesn't properly report the GPU utilization. In fact, a lot of stuff that's supported by the card doesn't properly work with nvidia-smi, so I'm assuming this might be an issue for all older generations.

nvidia-smi --query-gpu=utilization.gpu --format=csv -l 5                                                                                  
utilization.gpu [%]
[N/A]

As a result of min(100, round(nvidia_gpu.gpu_util, 1)) the card always shows a 100% GPU usage.

nvidia-settings show the GPU utilization correctly

nvidia-settings -c :0 -q '[gpu:0]/GPUUtilization' --terse                                                                              
graphics=32, memory=20, video=0, PCIe=2

It would be nice to have a fallback option to nvidia-settings

Additional context

No response

If interested, I could give a try implementing this.

I am not sure how to check if the result of the nvidia-smi is correct because I haven't had this issue, but I'd be happy if you could implement this.

I was thinking it would maybe cleaner to have seperate modules then. So instead of just gpu, it could be gpu_amd, gpu_nvidia_smi and gpu_nvidia_settings with a fallback of gpu to gpu_amd and gpu_nvidia_smi ?

It used to be like this, but I changed it to make it one module for all GPUs because it seemed easier to document and make it easier for the user to understand.

  1. If you want, you can make a completely new module for the gpu_nvidia_settings and add it to the excluded list on the consts.py file. The problem would be that the existing users will be affected.
  2. Another idea would be to create a new folder called for example custom_modules that the user could download and add them manually.

I would recommend the 2nd approach, but it's up to you.

I went for the 2nd route and published my custom module: https://github.com/PW999/lnxlink_gpu_nvidia_settings

This is awesome!
Thanks for taking the time to implement this!
If you don't mind, I could add it on the documentation so that it's easier found.

I have a minor comment:
You could create different identifiers so that it won't interfere with the original gpu module.

I somehow thought the name of the module would have an impact on the MQTT topics, but it doesn't, so I renamed it :) .
Feel free to add it to the documentation 👍

I've added your module at the documentation.
Thanks for your contribution to my project!

I got my hands on an older GPU, the GeForce GTX 660 which I installed the 450 driver.
I've updated the dev version of LNXlink which uses the nvidia-smi and falls back to nvidia-settings for the load if it finds a NaN value.

I chose to use only the GPU load because the rest of the of the nvidia-settings results were not correct.

Isn't it great how nvidia's own software doesn't play well with it's own hardware 😅

Luckily for me it works great most of the times, but I think the issues I'm having are mostly due to it running as a service (headless) which the nvidia-settings doesn't like. Restarting the service usually solves the problem, which makes it even more weird.

For me it doesn't work as a headless installation.
I've tried using XAUTHORITY as environment variable, but it still doesn't work.
How did you manage to get information from nvidia-settings without having an active DISPLAY?

PS. I am using Ubuntu Server without any graphical interface.