use too much CPU resource

Question

use too much CPU resource

matthew-z opened this issue 6 years ago · comments

Hi, gpustat -i uses about 80% CPU on my machine, is it expected or a bug?
In contrast, nvidia-smi -l 1 uses less than 10%.

OS: Ubuntu 18.04
Nv Driver: 410.48
CUDA: 10.0.130
CPU: AMD Threadripper 1900x
GPU: 2080Ti + 1080

Kaiyu Shi · Answer 1 · Fri Nov 09 2018 09:18:15 GMT+0800 (China Standard Time)

whats your interval value

Z ZH · Answer 2 · Fri Nov 09 2018 13:10:43 GMT+0800 (China Standard Time)

I didn't set, but gpustat -i 1 will reproduce the same result.

Kaiyu Shi · Answer 3 · Fri Nov 09 2018 15:30:33 GMT+0800 (China Standard Time)

Can you test if watch -n 1 nvidia-smi and watch -n gpustat use the same amount of CPU time.

Z ZH · Answer 4 · Fri Nov 09 2018 15:35:23 GMT+0800 (China Standard Time)

I tested, and with watch -n 1 they use the same amount of CPU time (about 0-20%).

Z ZH · Answer 5 · Fri Nov 09 2018 15:40:53 GMT+0800 (China Standard Time)

Hi, I just found that the CPU time problem of gpustat -i can be solved by running nvidia-smi daemon first.

Jongwook Choi · Answer 6 · Sat Nov 10 2018 05:20:40 GMT+0800 (China Standard Time)

A difference is that in the watch mode (i.e. gpustat -i) handle resources are fetched at every time step, which is somewhat expensive. Therefore we could optimize in a way that GPU handles are fetched only once in the beginning, and use the (cached) resources. This would be possible in the watch mode as the gpustat process won't terminate until interrupted.

Jongwook Choi · Answer 7 · Sat Nov 10 2018 05:23:10 GMT+0800 (China Standard Time)

Top four most expensive operations:

 36.50%  36.50%    4.59s     4.59s   nvmlDeviceGetHandleByIndex (pynvml.py:945)
 16.50%  16.50%    1.85s     1.85s   nvmlDeviceGetPowerUsage (pynvml.py:1289)
  9.50%   9.50%    1.18s     1.18s   nvmlDeviceGetUtilizationRates (pynvml.py:1379)
  7.50%   7.50%   0.805s    0.805s   nvmlDeviceGetComputeRunningProcesses (pynvml.py:1435)

Kaiyu Shi · Answer 8 · Sat Nov 10 2018 11:26:38 GMT+0800 (China Standard Time)

@wookayin Good point, I'm working on that.

Jongwook Choi · Answer 9 · Sun Feb 24 2019 10:12:58 GMT+0800 (China Standard Time)

Working on this as #61.

In my case querying power usage is most expensive, so I made it optional whenever possible. Could anybody check whether it leads to less CPU usage?

Jalin Wang · Answer 10 · Thu Mar 18 2021 22:41:53 GMT+0800 (China Standard Time)

A difference is that in the watch mode (i.e. gpustat -i) handle resources are fetched at every time step, which is somewhat expensive. Therefore we could optimize in a way that GPU handles are fetched only once in the beginning, and use the (cached) resources. This would be possible in the watch mode as the gpustat process won't terminate until interrupted.

But I still have no idea of the difference between "watch -n 1 gpustat" and "gpu -i 1".
Both of them need to call print_gpustat() every tick, while 'watch' requires additional step to parse command line arguments again and again. So intuitively the former should took longer.

BTW, here is the source code of 'watch' if needed: watch.c, where I found nothing useful : (

Kaiyu Shi · Answer 11 · Thu Mar 18 2021 22:56:47 GMT+0800 (China Standard Time)

Ter, The most time is not spent on parsing CLI options actually, which only requires several us at most.

…

On Mar 18, 2021, at 10:42 PM, Ter ***@***.***> wrote: A difference is that in the watch mode (i.e. gpustat -i) handle resources are fetched <https://github.com/wookayin/gpustat/blob/v0.5.0/gpustat/core.py#L372> at every time step, which is somewhat expensive. Therefore we could optimize in a way that GPU handles are fetched only once in the beginning, and use the (cached) resources. This would be possible in the watch mode as the gpustat process won't terminate until interrupted. But I still have no idea of the difference between "watch -n 1 gpustat" and "gpu -i 1". Both of them need to call print_gpustat() every tick, while 'watch' requires additional step to parse command line arguments again and again. So intuitively the former should took longer. BTW, here is the source code of 'watch' if needed: watch.c <https://gitlab.com/procps-ng/procps/-/blob/master/watch.c>, where I found nothing useful : ( — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#54 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCYKDBSTNB2M3VNKCKC2RLTEIGMNANCNFSM4GCTJF2Q>.

SangBin Cho · Answer 12 · Wed May 31 2023 09:12:08 GMT+0800 (China Standard Time)

Has this issue been resolved? I am observing this behavior from https://github.com/ray-project/ray/ when we run gpustat.new_query() repetitively at GCE.

SangBin Cho · Answer 13 · Wed May 31 2023 09:14:29 GMT+0800 (China Standard Time)

Lots of time is spent on NvmlInit & shutdown & nvmlDeviceGetHandleByIndex

Jongwook Choi · Answer 14 · Fri Nov 24 2023 15:33:54 GMT+0800 (China Standard Time)

In the recent versions of pynvml, nvmlDeviceGetHandleByIndex doesn't seem to be a bottleneck according to profiling result (If this is still slow, please let me know) so I did not optimize on redundant calls of nvmlDeviceGetHandleByIndex. #166 makes nvmlInit() called only once, so it should have some performance benefit.