XuehaiPan / nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Home Page:https://nvitop.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature Request] Add support to AMD's ROCm GPU

Junyi-99 opened this issue · comments

commented

Required prerequisites

  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

Motivation

I have been using nvitop for monitoring NVIDIA devices and processes, and I find it to be a great tool with a beautiful UI. Thank you for this good project!

However, I noticed that it doesn't support AMD's ROCm GPU platform. As an AMD user (we have an AMD GPU cluster), I can only use "rocm-smi" to monitor my GPU, and I would love to have a similar tool like nvitop for ROCm.

I believe that adding support for AMD's ROCm GPU would make nvitop a more versatile and inclusive monitoring tool. It would allow users who work with AMD GPUs to benefit from the same features and options that nvitop provides to NVIDIA users.

Solution

Using rocm-smi

Alternatives

No response

Additional context

No response

Hi @Junyi-99, thanks for raising this, and I apologize for the late response. I'm afraid that AMD graphics cards may not be supported in nvitop in the foreseeable future. This is due to a lack of necessary Python dependencies on PyPI (there is only an example in RadeonOpenCompute/rocm_smi_lib) and I personally don't have access to a machine with an AMD graphics card for testing. Sorry about that.

nvitop is completely open-source, and you are welcome to fork and develop your own monitor tool.

commented

Thanks for your reply and for digging into ROCm monitoring solutions. I will continue to pay attention to this project, I think this issue can be closed.

Thank you again.

Hi @Junyi-99, thanks for raising this, and I apologize for the late response. I'm afraid that AMD graphics cards may not be supported in nvitop in the foreseeable future. This is due to a lack of necessary Python dependencies on PyPI (there is only an example in RadeonOpenCompute/rocm_smi_lib) and I personally don't have access to a machine with an AMD graphics card for testing. Sorry about that.

nvitop is completely open-source, and you are welcome to fork and develop your own monitor tool.

Thank you so much for this work. It has become an important part of my work.
But recently I started working on other GPUs and it's very inconvenient without nvitop.
Can you help point out what parts needs to be modified to replace nvidia-smi related commands?

commented

Hi @Junyi-99, thanks for raising this, and I apologize for the late response. I'm afraid that AMD graphics cards may not be supported in nvitop in the foreseeable future. This is due to a lack of necessary Python dependencies on PyPI (there is only an example in RadeonOpenCompute/rocm_smi_lib) and I personally don't have access to a machine with an AMD graphics card for testing. Sorry about that.
nvitop is completely open-source, and you are welcome to fork and develop your own monitor tool.

Thank you so much for this work. It has become an important part of my work. But recently I started working on other GPUs and it's very inconvenient without nvitop. Can you help point out what parts needs to be modified to replace nvidia-smi related commands?

What do you mean "to replace nvidia-smi related commands"?

我现在用的GPU使用的监控工具与nvidia很像。
所以我想,是否需要将nvidia监控工具相关的api替换为现在在用的GPU的监控工具api,就可以用上nvitop了?

commented

我现在用的GPU使用的监控工具与nvidia很像。 所以我想,是否需要将nvidia监控工具相关的api替换为现在在用的GPU的监控工具api,就可以用上nvitop了?

我认为是可以的

我现在用的GPU使用的监控工具与nvidia很像。 所以我想,是否需要将nvidia监控工具相关的api替换为现在在用的GPU的监控工具api,就可以用上nvitop了?

我认为是可以的

我修改哪些地方可以达到这个目的呢?

commented

抱歉,你需要自己阅读源代码,找到需要修改的位置

好的, 多谢回复,我仔细看看代码

I've successfully adapted nvitop for AMD platforms using ROCm and tested it on mi50, mi100, and mi210 machines without affecting NVIDIA functionality.

I'd like to contribute these changes to extend nvitop's usability to AMD users.

Looking forward to your feedback!

  • MI50:
mi50
  • MI100:
mi100
  • MI210:
mi210

🤗 Really looking forward to nvitop's support for AMD GPUs, just like ROCm offers CUDA compatibility!