Metric per process/pod

Question

Metric per process/pod

andlogreg opened this issue a year ago · comments

Is is possible to see memory utilization per process instead of just the total memory usage on a specific gpu?

If not this could be quite useful. Given that this information is already available through nvidia-smi I imagine it should be doable.

Utku Özdemir · Answer 1 · Tue Apr 11 2023 00:02:05 GMT+0800 (China Standard Time)

The feature does not exist yet, but if it is possible to get that data using nvidia-smi, it should be fairly straightforward to implement. I might have a look into it some time, but don't know when (and no promises).

Renato Panda · Answer 2 · Mon Jun 05 2023 18:17:40 GMT+0800 (China Standard Time)

👍 for this functionality.

If it is of any use: a few months ago I needed to know which process and user was running what in each GPU of a server (+ memory and elapsed time). Since nvidia-smi did not provide such details, I ended up combining it with simple ps calls (which will depend on the host OS, thus may not be a great option). Basically, it went around:

# get gpus
nvidia-smi --query-gpu=index,uuid,gpu_name --format=csv

# get running processes
nvidia-smi --query-compute-apps=pid,process_name,gpu_name,gpu_uuid,used_memory --format=csv

# for each process, using -o lstart, -o etimes or -o cmd to get other details.
ps -fwwp #{pid} -o user -h

Initially tried to get everything about each process in one go but was having difficulties parsing stuff and just hacked this up, which at the time served the purpose. That was then transformed into a decent table, taking into account that a process might be running on multiple GPUs and several processes can be also placed in a single one.

Mr. Yao · Answer 3 · Tue Feb 27 2024 17:46:23 GMT+0800 (China Standard Time)

Hi. Are there any plans for an update on this?

Utku Özdemir · Answer 4 · Tue Feb 27 2024 17:56:53 GMT+0800 (China Standard Time)

Hi, I lately don't have any time for my personal projects, so I'm afraid it won't be there anytime soon.

PRs are always welcome though - if they are of good quality and have tests (so does not require much effort from my side), I'd merge them and make new releases.

Utku Özdemir · Answer 5 · Thu Jun 06 2024 16:25:38 GMT+0800 (China Standard Time)

Duplicate of #190, let's track this on that one.