Use nvidia-settings for GPU load if nvsmi load is NaN
PW999 opened this issue · comments
Describe the solution you'd like
I'm running a GTX770 using the 470 drivers on Manjaro. For some reasons, nvidia-smi doesn't properly report the GPU utilization. In fact, a lot of stuff that's supported by the card doesn't properly work with nvidia-smi, so I'm assuming this might be an issue for all older generations.
nvidia-smi --query-gpu=utilization.gpu --format=csv -l 5
utilization.gpu [%]
[N/A]
As a result of min(100, round(nvidia_gpu.gpu_util, 1))
the card always shows a 100% GPU usage.
nvidia-settings
show the GPU utilization correctly
nvidia-settings -c :0 -q '[gpu:0]/GPUUtilization' --terse
graphics=32, memory=20, video=0, PCIe=2
It would be nice to have a fallback option to nvidia-settings
Additional context
No response
If interested, I could give a try implementing this.
I am not sure how to check if the result of the nvidia-smi is correct because I haven't had this issue, but I'd be happy if you could implement this.
I was thinking it would maybe cleaner to have seperate modules then. So instead of just gpu, it could be gpu_amd, gpu_nvidia_smi and gpu_nvidia_settings with a fallback of gpu to gpu_amd and gpu_nvidia_smi ?
It used to be like this, but I changed it to make it one module for all GPUs because it seemed easier to document and make it easier for the user to understand.
- If you want, you can make a completely new module for the gpu_nvidia_settings and add it to the excluded list on the consts.py file. The problem would be that the existing users will be affected.
- Another idea would be to create a new folder called for example
custom_modules
that the user could download and add them manually.
I would recommend the 2nd approach, but it's up to you.
I went for the 2nd route and published my custom module: https://github.com/PW999/lnxlink_gpu_nvidia_settings
This is awesome!
Thanks for taking the time to implement this!
If you don't mind, I could add it on the documentation so that it's easier found.
I have a minor comment:
You could create different identifiers so that it won't interfere with the original gpu module.
I somehow thought the name of the module would have an impact on the MQTT topics, but it doesn't, so I renamed it :) .
Feel free to add it to the documentation 👍
I've added your module at the documentation.
Thanks for your contribution to my project!
I got my hands on an older GPU, the GeForce GTX 660 which I installed the 450 driver.
I've updated the dev version of LNXlink which uses the nvidia-smi and falls back to nvidia-settings for the load if it finds a NaN value.
I chose to use only the GPU load because the rest of the of the nvidia-settings results were not correct.
Isn't it great how nvidia's own software doesn't play well with it's own hardware 😅
Luckily for me it works great most of the times, but I think the issues I'm having are mostly due to it running as a service (headless) which the nvidia-settings doesn't like. Restarting the service usually solves the problem, which makes it even more weird.
For me it doesn't work as a headless installation.
I've tried using XAUTHORITY as environment variable, but it still doesn't work.
How did you manage to get information from nvidia-settings without having an active DISPLAY?
PS. I am using Ubuntu Server without any graphical interface.