[Issue]: Installing rocm
BujSet opened this issue · comments
Problem Description
I have been trying to install rocm on my machine without success for quite some time, and haven't been having much success.
In reporting this issue, I was ran
echo "GPU:" && /opt/rocm/bin/rocminfo | grep -E "^\s*(Name|Marketing Name)";
and had the following output:
GPU:
Which is was concerning, however, I do see that the GPU is discoverable via:
sudo lshw -numeric -C display
*-display UNCLAIMED
description: VGA compatible controller
product: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67DF]
vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]
physical id: 0
bus info: pci@0000:09:00.0
version: c7
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list
configuration: latency=0
resources: memory:e0000000-efffffff memory:f0000000-f01fffff ioport:2000(size=256) memory:f0a00000-f0a3ffff memory:f0a60000-f0a7ffff
My kernel verison is:
uname -srmv
Linux 5.15.0-91-generic #101~20.04.1-Ubuntu SMP Thu Nov 16 14:22:28 UTC 2023 x86_64
Which seems to satisfy the supported OS requirements listed here.
Operating System
Ubuntu 20.04.6 LTS (Focal Fossa)"
CPU
AMD Ryzen 7 2700X Eight-Core Processor
GPU
AMD Instinct MI300X
ROCm Version
ROCm 6.0.0
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
ROCk module is NOT loaded, possibly no GPU devices
Additional Information
Looking into it, it looks like rocm6.0 may not be supported for my device, however, I'm unable to determine what version of rocm I need. A pointer to the right version installation docs would be helpful here, thanks!
Also, the issue dialogue made me select a GPU, but didn't list my GPU, so I chose MI300X.
On ROCm 6.0 release notes, there's a tick for MI300X for Ubuntu 22.04.5 only, I would try that first.
If your company does not mind a arch distribution, you can try it too. The community maintained packages work well, even on newer kernels.
Just remember, you need ROCm 6.0 so enable extra-testing repo to get the rocm6.0 package on arch
Sorry, maybe I wasn't clear in my original post. I do not have a MI300 GPU, I believe I have a Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] device. I tried installing rocm6.0, but it seems that my device is not compatible so I'm trying to ascertain what version of rocm I can install that is compatible with my product. I only selected MI300 since the dropdown option did not list my GPU as a valid option.
If that's the case, you will need to search around the internet for help.
The latest version of ROCm will not support such old cards, maybe try an older version with a few hacks to get it working
Ah I see, I was looking through the docs, hoping to find a table of some sort that could help identify what versions of rocm are support on what devices, but with no such luck. Will keep searching, thanks anyway though!
@BujSet, Ellesmere was once officially supported, but the last version of ROCm that was tested on Ellesmere was ROCm 3.5. You can probably still use the modern amdgpu-dkms, but you'll need to use the old rocm repo for HIP and the math libraries. Support for Ellesmere was still enabled in a number of later releases, but I'm not sure how well it worked since it was not tested. On some later releases, you need to set the environment variable ROC_ENABLE_PRE_VEGA=1
for your GPU to be recognized.
IIRC, these ROCm 3.5 packages were built for Ubuntu 18.04:
https://repo.radeon.com/rocm/apt/3.5.1/