utkuozdemir / nvidia_gpu_exporter

Nvidia GPU exporter for prometheus using nvidia-smi binary

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Monitor can't turn off after windows going to saving mode

kissaev opened this issue · comments

When monitor going to sleep, it wake up immidiately. So basically exporter broke monitor sleep.

To Reproduce
Steps to reproduce the behavior:

  1. When exporter running as service, monitor can't fall asleep
  2. When i stop service, monitor can turn off when going to sleep

Expected behavior
Monitor turned off after delay in windows power mode settings

Console output

Model and Version

  • GPU Mode: Gigabyte RTX 3060 Gaming OC
  • App version and architecture: v0.3.0 [x86_64.zip]
  • Installation method: binary download, runs as a service with nssm
  • Operating System: Windows 10 LTSC
  • Nvidia GPU driver version: Windows Studio Driver 472.84

Additional context
In Grafana i update information from exporter every 5 sec, and monitor turns on every 5 sec after it going to sleep.

This is pretty difficult for me to debug. Can you disable your metric scraping on prometheus, put your monitor to sleep and then make a single request from another machine (your cellphone or from a laptop) to the http://<YOUR_PC_IP>:9835/metrics and see it if wakes your monitor up.

This way we can be sure that the scraping operation itself actually wakes it up and it is not caused by something else.

And a question for clarification: Is it your whole computer waking up from the sleep, or is it just the monitor?

If it is the whole computer, then the issue might be that your ethernet card is configured to wake up from a sleep when it receives a network packet. To disable this behavior, you can go to device manager, open your ethernet adapter settings and find this screen:

Then either disable the device to wake up the computer, or make sure that Only allow a magic packet to wake the computer is checked.

This is pretty difficult for me to debug. Can you disable your metric scraping on prometheus, put your monitor to sleep and then make a single request from another machine (your cellphone or from a laptop) to the http://<YOUR_PC_IP>:9835/metrics and see it if wakes your monitor up.

Thanks i will try and let you know.

p.s. I don't put whole pc to sleep, just monitor

I check it out, and yes, scraping wakes up monitor. I stopped prometheus and open http://<YOUR_PC_IP>:9835/metrics from my phone, and monitor wake up, and instantly on/off again every time i refresh page with metrics.

Thanks, this is helpful. Now I'll ask you to try one more thing to find out if the cause of wake-up is the exporter code or is it the querying of the gpu itself.

Can you please do the following:

  1. Open a Powershell prompt

  2. Run the following command:

    Start-Sleep -Seconds 30; nvidia-smi --query-gpu="timestamp,driver_version,count,name,serial,uuid,pci.bus_id,pci.domain,pci.bus,pci.device,pci.device_id,pci.sub_device_id,pcie.link.gen.current,pcie.link.gen.max,pcie.link.width.current,pcie.link.width.max,index,display_mode,display_active,persistence_mode,accounting.mode,accounting.buffer_size,driver_model.current,driver_model.pending,vbios_version,inforom.img,inforom.oem,inforom.ecc,inforom.pwr,gom.current,gom.pending,fan.speed,pstate,clocks_throttle_reasons.supported,clocks_throttle_reasons.active,clocks_throttle_reasons.gpu_idle,clocks_throttle_reasons.applications_clocks_setting,clocks_throttle_reasons.sw_power_cap,clocks_throttle_reasons.hw_slowdown,clocks_throttle_reasons.hw_thermal_slowdown,clocks_throttle_reasons.hw_power_brake_slowdown,clocks_throttle_reasons.sw_thermal_slowdown,clocks_throttle_reasons.sync_boost,memory.total,memory.used,memory.free,compute_mode,utilization.gpu,utilization.memory,encoder.stats.sessionCount,encoder.stats.averageFps,encoder.stats.averageLatency,ecc.mode.current,ecc.mode.pending,ecc.errors.corrected.volatile.device_memory,ecc.errors.corrected.volatile.dram,ecc.errors.corrected.volatile.register_file,ecc.errors.corrected.volatile.l1_cache,ecc.errors.corrected.volatile.l2_cache,ecc.errors.corrected.volatile.texture_memory,ecc.errors.corrected.volatile.cbu,ecc.errors.corrected.volatile.sram,ecc.errors.corrected.volatile.total,ecc.errors.corrected.aggregate.device_memory,ecc.errors.corrected.aggregate.dram,ecc.errors.corrected.aggregate.register_file,ecc.errors.corrected.aggregate.l1_cache,ecc.errors.corrected.aggregate.l2_cache,ecc.errors.corrected.aggregate.texture_memory,ecc.errors.corrected.aggregate.cbu,ecc.errors.corrected.aggregate.sram,ecc.errors.corrected.aggregate.total,ecc.errors.uncorrected.volatile.device_memory,ecc.errors.uncorrected.volatile.dram,ecc.errors.uncorrected.volatile.register_file,ecc.errors.uncorrected.volatile.l1_cache,ecc.errors.uncorrected.volatile.l2_cache,ecc.errors.uncorrected.volatile.texture_memory,ecc.errors.uncorrected.volatile.cbu,ecc.errors.uncorrected.volatile.sram,ecc.errors.uncorrected.volatile.total,ecc.errors.uncorrected.aggregate.device_memory,ecc.errors.uncorrected.aggregate.dram,ecc.errors.uncorrected.aggregate.register_file,ecc.errors.uncorrected.aggregate.l1_cache,ecc.errors.uncorrected.aggregate.l2_cache,ecc.errors.uncorrected.aggregate.texture_memory,ecc.errors.uncorrected.aggregate.cbu,ecc.errors.uncorrected.aggregate.sram,ecc.errors.uncorrected.aggregate.total,retired_pages.single_bit_ecc.count,retired_pages.double_bit.count,retired_pages.pending,temperature.gpu,temperature.memory,power.management,power.draw,power.limit,enforced.power.limit,power.default_limit,power.min_limit,power.max_limit,clocks.current.graphics,clocks.current.sm,clocks.current.memory,clocks.current.video,clocks.applications.graphics,clocks.applications.memory,clocks.default_applications.graphics,clocks.default_applications.memory,clocks.max.graphics,clocks.max.sm,clocks.max.memory,mig.mode.current,mig.mode.pending" --format=csv
  3. Immediately put your monitor to sleep. The command you ran will wait for 30 seconds then run the exact command the exporter runs when its scrape endpoint is called.

  4. Observe if the monitor wakes up in 30 seconds, when the nvidia-smi command runs.

Also please check the Powershell output afterwards to make sure that command actually was run on the background.

Yes, wakes up from this command

Ok, then this is not something that can be solved inside the exporter, it is the behavior of the nvidia-smi tool itself. My recommendations are the following:

  • Do a clean reinstall of the Nvidia driver
  • Try older/newer driver versions
  • Try the game-ready driver
  • Investigate the power options/NVIDIA control panel options that might cause such behavior

Please let me know if you can pinpoint it, so I can add it to the documentation.

I will close this ticket, since it seems there's nothing that can be done on exporter level. If you find out something that indicates otherwise, feel free to report it, so we can re-open.
