pop-os / pop

A project for managing all Pop!_OS sources

Home Page:https://system76.com/pop

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ROCm breaks the system after recent update.

loafylemon opened this issue · comments

Distribution (run cat /etc/os-release):

NAME="Pop!_OS"
VERSION="22.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 22.04 LTS"
VERSION_ID="22.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=jammy
UBUNTU_CODENAME=jammy
LOGO=distributor-logo-pop-os

Hardware information:

AMD Ryzen 7 7800X3D 8-Core Processor
AMD Radeon 7900 XTX

Related Application and/or Package Version (run apt policy $PACKAGE NAME):
N/A

Issue/Bug Description:
After the recent update, I've encountered an issue where the system wouldn't go past the 'Something went wrong :(' GNOME screen. Since then, I have reinstalled Pop_!OS in order to find the point of failure, and after trying to install ROCm and rebooting, the issue reoccurred. I must emphasise, the problem did not occur before the recent update.

Logs:
dump.log

Steps to reproduce (if you know):

  1. Download and install amdgpu-install.
sudo apt update
wget https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
sudo apt install ./amdgpu-install_6.0.60002-1_all.deb
  1. Add Pop_!OS to the supported list.
sudo nano /usr/bin/amdgpu-install
case "$ID" in
ubuntu|linuxmint|debian|pop)
                         ^
  1. Install ROCm.
sudo amdgpu-install --usecase=rocm --no-dkms
  1. Reboot.

Expected behavior:

Other Notes:
I am able to log-in through TTYL by pressing CTRL+ALT+F4.

Things I have tried to make it work:

  • Install another DE, WM.
  • Reconfiguring dpkg.
  • Installing older version of ROCm (5.7).
  • Disabling hybrid graphics/iGPU.
  • Rolling back to the previous version of Pop_!OS <- This allows ROCm to work, but is not ideal.

EDIT: (3 MAR 2024)
Still broken on the new kernel.
Linux pop-os 6.6.10-76060610-generic #202401051437~1709085277~22.04~31d73d8 SMP PREEMPT_DYNAMIC Wed F x86_64 x86_64 x86_64 GNU/Linux

I have exactly the same issue.

Thank goodness I made a backup with Timeshift before experimenting with installing ROCm 6.0.2. The only way to make it work right now on Pop!_OS is by installing AMD's DKMS graphics drivers using kernel 6.5.4:

amdgpu-install --usecase=graphics,rocm

I haven't found a way to get ROCm 6.0.2 working with the open-source drivers or in a more recent kernel.

Can comfirm that this still is a problem has gotten worse. Dual monitors no longer work, going to the settings crahses the session and sends you back to the menu and worst of all it seems the drivers arent even loaded in. Uninstalling amdgpu dose not appear to do anything and im having to move over 1tb of data of files and videos because i have to reinstall the entire os.
Whatever amdgpu installs it seems to have broken parts of the os after this update.

If your lucky you can try spamming tab on your pcs startup. then wait for the bootloader to show you the option to boot from the old kernel, current kernel, or recovery. Select old kernel and see if that fixes it. If it does DO NOT update until pop os figures out there shit.

until pop os figures out there shit.

Please keep in mind that we have a code of conduct.

It's been mentioned multiple times that this isn't Pop's fault. AMD's DKMS modules are often incompatible with the kernels we ship, and there's nothing we can do about it. Since System76 hardware is very bleeding-edge, we often need to ship newer kernels for hardware enablement.

At this time, we're recommending ROCm/AMDGPU-PRO users implement a containerized workflow, rather than trying to install these drivers directly on your OS.

until pop os figures out there shit.

Please keep in mind that we have a code of conduct.

It's been mentioned multiple times that this isn't Pop's fault. AMD's DKMS modules are often incompatible with the kernels we ship, and there's nothing we can do about it. Since System76 hardware is very bleeding-edge, we often need to ship newer kernels for hardware enablement.

At this time, we're recommending ROCm/AMDGPU-PRO users implement a containerized workflow, rather than trying to install these drivers directly on your OS.

Yeah I am sorry about that comment. Im just mad That I have to reinstall everything. I have a 2tb nvme drive that im moving the things to. it acts as a secondary storage device for my pc. IM moving all my important files to there temporarily. until I reinstall from the built in recovery mode. then after that ill move the files back to there respected places and delete the files from the secondary drive. It just makes me really mad that im having to format and reinstall everything. But I am sorry for that comment

I managed to get ROCm 6.02 working on Pop!_OS 22.04 LTS with the latest Linux kernel (6.8.0) and a RX 7900 GRE.

  1. First, make some kind of backup of your system (there's no guarantee these steps will work for you).
  2. Install the AMD driver using:
amdgpu-install --no-dkms
  1. Restart the system and ensure everything is working properly.
  2. Install ROCm:
amdgpu-install --usecase=rocm --no-dkms
  1. Add yourself for rendering and video permissions:
sudo usermod -aG video $USER
sudo usermod -aG render $USER
  1. Restart again and verify that ROCm is functioning properly:
rocminfo
rocm-smi

Keep in mind that this might entail a regression in some aspects. For instance, MESA shifts from version 24 to 23.3 according to the command glxinfo -B.

Good luck!

I managed to get ROCm 6.02 working on Pop!_OS 22.04 LTS with the latest Linux kernel (6.8.0) and a RX 7900 GRE.

1. First, make some kind of backup of your system (there's no guarantee these steps will work for you).

2. Install the AMD driver using:
amdgpu-install --no-dkms
3. Restart the system and ensure everything is working properly.

4. Install ROCm:
amdgpu-install --usecase=rocm --no-dkms
5. Add yourself for rendering and video permissions:
sudo usermod -aG video $USER
sudo usermod -aG render $USER
6. Restart again and verify that ROCm is functioning properly:
rocminfo
rocm-smi

Keep in mind that this might entail a regression in some aspects. For instance, MESA shifts from version 24 to 23.3 according to the command glxinfo -B.

Good luck!

can you show a picture of cpu-x to show that the gpu drivers are loaded in and your kernel version. I dont think im going to install rocm again after it destroyed my installation. also who knows whats going to happen next update. it could break it more

I managed to get ROCm 6.02 working on Pop!_OS 22.04 LTS with the latest Linux kernel (6.8.0) and a RX 7900 GRE.

1. First, make some kind of backup of your system (there's no guarantee these steps will work for you).

2. Install the AMD driver using:
amdgpu-install --no-dkms
3. Restart the system and ensure everything is working properly.

4. Install ROCm:
amdgpu-install --usecase=rocm --no-dkms
5. Add yourself for rendering and video permissions:
sudo usermod -aG video $USER
sudo usermod -aG render $USER
6. Restart again and verify that ROCm is functioning properly:
rocminfo
rocm-smi

Keep in mind that this might entail a regression in some aspects. For instance, MESA shifts from version 24 to 23.3 according to the command glxinfo -B.
Good luck!

can you show a picture of cpu-x to show that the gpu drivers are loaded in and your kernel version. I dont think im going to install rocm again after it destroyed my installation. also who knows whats going to happen next update. it could break it more

It's in Spanish and I'm not familiar with CPU-X, but I believe this is what you're looking for:

imagen

I understand that you don't want to try again. I've nuked my system a couple of times after discovering that with Timeshift, you can 'snapshot' your system (excluding personal files) and try things carelessly.

Thank you. seems the drivers have loaded. Does rocm/amdgpupro offer better performance than the standard drivers included in pop os

Thank you. seems the drivers have loaded. Does rocm/amdgpupro offer better performance than the standard drivers included in pop os

I don't believe they offer any additional benefits beyond the open-source drivers, aside from the ability to utilize ROCm. While I do occasionally game, and the performance is excellent, it was also stellar with the open-source drivers.

We have a WIP article that will explain how to install ROCm in an easier way, now that AMD provides an Ubuntu repository for it and offers a ROCm package that doesn't do anything DKMS-related: https://github.com/system76/docs/pull/1231/files

Can anyone confirm if following the instructions in that draft article gets you what you need?

I did a fresh reinstall of Pop!_OS and followed the steps outlined in the aforementioned article, and can confirm that ROCm seems to work just fine. Thank you!