ROCm / ROC-smi

ROC System Management Interface

Home Page:https://github.com/RadeonOpenCompute/ROC-smi/blob/master/README.md

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GPU memory usage

micmelesse opened this issue · comments

Are there any plans to add a feature to show GPU memory usage? Is there anything I can do to help with that effort?

Yes this is one of the future requirements, we working with HW/FIRMWARE team on this

Just did a PR. #43

We are waiting for amdgpu to move the VRAM usage from debugfs to sysfs, as that will give us a more stable interface (debugfs in general is considered to bean unstable and fluid interface, while sysfs is much more stable and defined). For now, the values can be obtained from "sudo cat /sys/kernel/debug/dri/0/amdgpu_vram_mm | tail -n 2" , since the dri/# directly correlates to drm/card# . The topology nodes don't directly correlate to DRM devices, which means your PR won't work, unfortunately. Hopefully we'll have this for 2.0 or 2.1, I don't have a time commitment yet from amdgpu, but it should be coming soon.

Update: This will be in the 2.3 release

If anyone is stumbling over this issue and is interested in getting the percentage of used VRAM: this patch hacks it into the standard output of rocm-smi, but sacrifices the default 80 cols width of the output.

Thanks @sebpuetz, this makes my life a lot easier! I am going to pull your patch in as well, though changing it slightly to keep to 80chars. Should hopefully make it into 2.5

Closing this since it's in 2.3. Support for showing the VRAM usage as a percentage in the concise output (no flags given) will be in 2.5

This is not working for me on ROCm 2.4 with a gfx803 on mainline kernel 5.0.14.

I see, for example, Unable to get vram memory usage information in the output of rocm-smi --showmeminfo all. Thanks to the above thread, I was able to get this information from /sys/kernel/debug/dri/0/amdgpu_vram_mm, but it seems that the needed info is not available in sysfs for me. Is there something I need to do to expose this information in the way the rocm-smi script is expecting?

The sysfs files that the SMI depends on are not part of the mainline 5.0.14 kernel. They're only in ROCK 2.3-and-newer, and Linux 5.1-rc2-and-newer. You'll need my kernel patch in the kernel to make it work (see https://lists.freedesktop.org/archives/amd-gfx/2019-February/031900.html ) . If that commit isn't in your kernel (like it isn't for 5.0.14), then the required sysfs files won't be present, thus the SMI won't be able to read them.

For reference, you can verify if they're there by doing "ls /sys/class/drm/card0/device/mem_info*" and seeing if it returns 6 files, or 0.

Thank you, that clears it up!

EDIT: It seems this is not in 5.1, but I do see it on the kernel master, so here's looking forward to 5.2!