use too much CPU resource
matthew-z opened this issue · comments
whats your interval value
I didn't set, but gpustat -i 1
will reproduce the same result.
Can you test if watch -n 1 nvidia-smi
and watch -n gpustat
use the same amount of CPU time.
I tested, and with watch -n 1
they use the same amount of CPU time (about 0-20%).
Hi, I just found that the CPU time problem of gpustat -i
can be solved by running nvidia-smi daemon
first.
A difference is that in the watch mode (i.e. gpustat -i
) handle resources are fetched at every time step, which is somewhat expensive. Therefore we could optimize in a way that GPU handles are fetched only once in the beginning, and use the (cached) resources. This would be possible in the watch mode as the gpustat process won't terminate until interrupted.
Top four most expensive operations:
36.50% 36.50% 4.59s 4.59s nvmlDeviceGetHandleByIndex (pynvml.py:945)
16.50% 16.50% 1.85s 1.85s nvmlDeviceGetPowerUsage (pynvml.py:1289)
9.50% 9.50% 1.18s 1.18s nvmlDeviceGetUtilizationRates (pynvml.py:1379)
7.50% 7.50% 0.805s 0.805s nvmlDeviceGetComputeRunningProcesses (pynvml.py:1435)
Working on this as #61.
In my case querying power usage is most expensive, so I made it optional whenever possible. Could anybody check whether it leads to less CPU usage?
A difference is that in the watch mode (i.e.
gpustat -i
) handle resources are fetched at every time step, which is somewhat expensive. Therefore we could optimize in a way that GPU handles are fetched only once in the beginning, and use the (cached) resources. This would be possible in the watch mode as the gpustat process won't terminate until interrupted.
But I still have no idea of the difference between "watch -n 1 gpustat" and "gpu -i 1".
Both of them need to call print_gpustat() every tick, while 'watch' requires additional step to parse command line arguments again and again. So intuitively the former should took longer.
BTW, here is the source code of 'watch' if needed: watch.c, where I found nothing useful : (
Has this issue been resolved? I am observing this behavior from https://github.com/ray-project/ray/ when we run gpustat.new_query() repetitively at GCE.
Lots of time is spent on NvmlInit & shutdown & nvmlDeviceGetHandleByIndex
In the recent versions of pynvml, nvmlDeviceGetHandleByIndex
doesn't seem to be a bottleneck according to profiling result (If this is still slow, please let me know) so I did not optimize on redundant calls of nvmlDeviceGetHandleByIndex
. #166 makes nvmlInit()
called only once, so it should have some performance benefit.