ROCm / ROCm

AMD ROCm™ Software - GitHub Home

Home Page:https://rocm.docs.amd.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Issue]: ROCm 6.1 amdgpu-dkms fails to build on Linux

hankster112 opened this issue · comments

Problem Description

Issue is with latest release of ROCm 6.1.

DKMS make.log for amdgpu-6.7.0-1756574.22.04 for kernel 6.6.13+bpo-amd64 (amd64)
Wed Apr 17 09:26:34 AM CDT 2024
make: Entering directory '/usr/src/linux-headers-6.6.13+bpo-amd64'
/tmp/amd.Z35UpKSZ/Makefile:52: *** dma_resv->seq is missing. exit....  Stop.
make[1]: *** [/usr/src/linux-headers-6.6.13+bpo-common/Makefile:1938: /tmp/amd.Z35UpKSZ] Error 2
make: *** [/usr/src/linux-headers-6.6.13+bpo-common/Makefile:246: __sub-make] Error 2
make: Leaving directory '/usr/src/linux-headers-6.6.13+bpo-amd64'

Used the latest install instructions from https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
Tried with repos for Ubuntu 22.04 and Ubuntu 20.04 which should also work on Debian. Purged all config files and AMD apt sources beforehand.

Devuan 5 Daedalus (fork of Debian 12 without systemd)
AMD Ryzen 7 2700
MSI Radeon 5700 XT
Kernel 6.6.13+bpo-amd64

Operating System

Debian 12

CPU

AMD Ryzen 7 2700

GPU

AMD Radeon RX 7900 XT

ROCm Version

ROCm 6.0.0

ROCm Component

No response

Steps to Reproduce

Attempt to build from source on Debian 12.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 2700 Eight-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 2700 Eight-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3200                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    16301932(0xf8bf6c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16301932(0xf8bf6c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16301932(0xf8bf6c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1010                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 5700 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      4096(0x1000) KB                    
  Chip ID:                 29471(0x731f)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2100                               
  BDFID:                   11008                              
  Internal Node ID:        1                                  
  Compute Unit:            40                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    1280(0x500)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS:                     
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1010:xnack-  
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***     

Additional Information

No response

https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.1.0/reference/system-requirements.html suggests kernel 6.6 is not supported (yet?). I'm concerned about this because of #2993 -> #2939.

https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.1.0/reference/system-requirements.html suggests kernel 6.6 is not supported (yet?). I'm concerned about this because of #2993 -> #2939.

My distro only has official kernels for 6.1 or 6.6, no in-between. It's like I'm caught between downgrading to Debian 11 where kernel 5.x might work, or waiting until 6.x to see if ROCm might work. It's really frustrating.

Yes, similar situation with Ubuntu 22.04 vs 24.04.

Most recent kernel I have working with ROCm is kernel 6.5 with Vega20 GPU.

6.6 kernel support was promised earlier to arrive in the 6.1 release. It hasn't.

Even if the failing check on lines 51-53 of the DKMS driver Makefile is commented out, there are more compilation errors, and those also look the same as in the 6.0 driver release. I fail to see what's changed with regards to newer kernel support.

are you able to use stock upstream amdgpu driver and firmware with just the ROCm userspace install ? Usually that is also at a good functioning state.

are you able to use stock upstream amdgpu driver and firmware with just the ROCm userspace install ? Usually that is also at a good functioning state.

No, I need ROCm for Blender GPU support.

commented

ROCM works perfectly fine without the dkms module with just the upstream amdgpu module built into the kernel. the dkms module is really only for when some features have not landed in the kernel and to support old enterprise distributions:
i would like to add the following strong recommendations:

  1. Prefer simply using the upstream default amdgpu, unless you expirance issues or need a specific feature it lacks. If you are on a recent kernel its generally better than the dkms module anyhow
  2. STROGNLY prefer to use rocm from distro packages (available in debian, arch linux and others) over using the offical amd packages or distrobutions.

ROCM works perfectly fine without the dkms module with just the upstream amdgpu module built into the kernel. the dkms module is really only for when some features have not landed in the kernel and to support old enterprise distributions: i would like to add the following strong recommendations:

  1. Prefer simply using the upstream default amdgpu, unless you expirance issues or need a specific feature it lacks. If you are on a recent kernel its generally better than the dkms module anyhow
  2. STROGNLY prefer to use rocm from distro packages (available in debian, arch linux and others) over using the offical amd packages or distrobutions.

ROCm is not available from my distro's package manager. Trying to install ROCm from AMD's website with the --no-dkms flag results in failed dependences since it relies on systemd for some reason (Devuan is a fork of Debian with no dependencies on systemd). Blender needs ROCm to use GPU acceleration.

commented

Your trying to install packages from a totally (Ubuntu) different distribution on devuan of course that dosent work. As with any piece of software on linux, unless its totaly static (which you defiantly dont want for a system library like compute drivers) you need to either have it in your distros repository or you need to compile it yourself. Since devuan is so close to debian you might also get away with installing the rocm packages for debian for the corrisponding debian version, but thats not great either.

I would also petition your distro maintainers to import the packages from debian to build on thair ci, since the debian-ai team have packaged all rocm componants with /debian directories this should be pretty easy, with only limited modification of the packageing files for devuan.

ROCM works perfectly fine without the dkms module with just the upstream amdgpu module built into the kernel. the dkms module is really only for when some features have not landed in the kernel and to support old enterprise distributions: i would like to add the following strong recommendations:

  1. Prefer simply using the upstream default amdgpu, unless you expirance issues or need a specific feature it lacks. If you are on a recent kernel its generally better than the dkms module anyhow
  2. STROGNLY prefer to use rocm from distro packages (available in debian, arch linux and others) over using the offical amd packages or distrobutions.

The upstream driver lacks power-saving features and colour control. My laptop display is very bright and drains the battery quickly. It has to be tuned down.

What seems to be the problem with upstream kernel support? My laptop is almost a year old and there's still no support in the mainline.

Also there is no external monitor support in the mainline driver, which is a big deal.

6.6 kernel support was promised earlier to arrive in the 6.1 release. It hasn't.

Kernels up to 6.8 are now supported with the latest rocm 6.1.1 release

6.6 kernel support was promised earlier to arrive in the 6.1 release. It hasn't.

Kernels up to 6.8 are now supported with the latest rocm 6.1.1 release

any supported way to get to 6.8 kernel? i tried mainline kernels process but anything above 6.7 won't install headers because of glibc update that's not on 22.0.4

would be nice to just have 24.04 support 👍

6.6 kernel support was promised earlier to arrive in the 6.1 release. It hasn't.

Kernels up to 6.8 are now supported with the latest rocm 6.1.1 release

Same error.

user@device:~$ sudo apt install amdgpu-dkms
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  amdgpu-dkms-firmware
The following NEW packages will be installed:
  amdgpu-dkms amdgpu-dkms-firmware
0 upgraded, 2 newly installed, 0 to remove and 7 not upgraded.
Need to get 22.3 MB of archives.
After this operation, 555 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 https://repo.radeon.com/amdgpu/6.1.1/ubuntu jammy/main amd64 amdgpu-dkms-firmware all 1:6.7.0.60101-1769056.22.04 [11.5 MB]
Get:2 https://repo.radeon.com/amdgpu/6.1.1/ubuntu jammy/main amd64 amdgpu-dkms all 1:6.7.0.60101-1769056.22.04 [10.8 MB]
Fetched 22.3 MB in 1s (17.7 MB/s)       
Selecting previously unselected package amdgpu-dkms-firmware.
(Reading database ... 520152 files and directories currently installed.)
Preparing to unpack .../amdgpu-dkms-firmware_1%3a6.7.0.60101-1769056.22.04_all.deb ...
Unpacking amdgpu-dkms-firmware (1:6.7.0.60101-1769056.22.04) ...
Selecting previously unselected package amdgpu-dkms.
Preparing to unpack .../amdgpu-dkms_1%3a6.7.0.60101-1769056.22.04_all.deb ...
Unpacking amdgpu-dkms (1:6.7.0.60101-1769056.22.04) ...
Setting up amdgpu-dkms-firmware (1:6.7.0.60101-1769056.22.04) ...
Setting up amdgpu-dkms (1:6.7.0.60101-1769056.22.04) ...
Loading new amdgpu-6.7.0-1769056.22.04 DKMS files...
Building for 6.6.13+bpo-amd64
Building for architecture amd64
Building initial module for 6.6.13+bpo-amd64
Error! Bad return status for module build on kernel: 6.6.13+bpo-amd64 (amd64)
Consult /var/lib/dkms/amdgpu/6.7.0-1769056.22.04/build/make.log for more information.
dpkg: error processing package amdgpu-dkms (--configure):
 installed amdgpu-dkms package post-installation script subprocess returned error exit status 10
Errors were encountered while processing:
 amdgpu-dkms
needrestart is being skipped since dpkg has failed
E: Sub-process /usr/bin/dpkg returned an error code (1)

Kernels up to 6.8 are now supported with the latest rocm 6.1.1 release

Yeah, those "up to 6.8" kernels are HWE, which don't exist outside Ubuntu. Is mainlining the only case when the GPU driver becomes available in Debian and other distros? Because I'm not using Ubuntu 22.

Is there a plan for mainlining? Are there any blockages?

Same error.

Then the problem is something else, maybe as mentioned previously related to Ubuntu's glibc version.

I just pointed out that the 6.1.1 rocm release does finally compile with a newish kernel, which the previous releases did not, due to kernel compatibility issues.

I'm currently running the latest rocm on Debian with a 6.8.10 kernel.

Then the problem is something else, maybe as mentioned previously related to Ubuntu's glibc version.

I just pointed out that the 6.1.1 rocm release does finally compile with a newish kernel, which the previous releases did not, due to kernel compatibility issues.

I'm currently running the latest rocm on Debian with a 6.8.10 kernel.

Does anyone get the apt GPG key signature mismatch? apt cannot download package lists.

Warning: GPG error: https://repo.radeon.com/amdgpu/6.1.1/ubuntu jammy InRelease: The following signatures were invalid: ERRSIG 9386B48A1A693C5C
Error: The repository 'https://repo.radeon.com/amdgpu/6.1.1/ubuntu jammy InRelease' is not signed.

I checked the key signature and it matches:

$ gpg --list-packets /etc/apt/trusted.gpg.d/rocm-keyring.gpg 
# off=0 ctb=99 tag=6 hlen=3 plen=525
:public key packet:
	version 4, algo 1, created 1470083360, expires 0
	pkey[0]: [4096 bits]
	pkey[1]: [17 bits]
	keyid: 9386B48A1A693C5C

Difficult to tell what's going on.

Same error with kernel 6.7.12 on Debian. That's amdgpu-dkms 6.1.1.

*** dma_resv->seq is missing. exit....  Stop.

Still waiting for any kind of update/explanation for this. Still on the same kernel when I made the OP, same error as vkomenda when building.

amdgpu DKMS module still fails to build for the latest 6.7.12 kernel.

dpkg: error processing package linux-headers-amd64 (--configure):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 linux-image-6.7.12+bpo-amd64
 linux-image-amd64
 linux-headers-6.7.12+bpo-amd64
 linux-headers-amd64
needrestart is being skipped since dpkg has failed
E: Sub-process /usr/bin/dpkg returned an error code (1)

The error in make.log:

DKMS make.log for amdgpu-6.7.0-1756574.22.04 for kernel 6.7.12+bpo-amd64 (x86_64)
Sun May 26 12:47:00 PM CDT 2024
make: Entering directory '/usr/src/linux-headers-6.7.12+bpo-amd64'
/tmp/amd.UmjcyhcM/Makefile:52: *** dma_resv->seq is missing. exit....  Stop.
make[1]: *** [/usr/src/linux-headers-6.7.12+bpo-common/Makefile:1936: /tmp/amd.UmjcyhcM] Error 2
make: *** [/usr/src/linux-headers-6.7.12+bpo-common/Makefile:246: __sub-make] Error 2
make: Leaving directory '/usr/src/linux-headers-6.7.12+bpo-amd64'

Same error with latest kernel.

DKMS make.log for amdgpu-6.7.0-1769056.20.04 for kernel 6.7.12+bpo-amd64 (amd64)
Tue May 28 05:45:45 PM CDT 2024
make: Entering directory '/usr/src/linux-headers-6.7.12+bpo-amd64'
/tmp/amd.DeGLA8su/Makefile:52: *** dma_resv->seq is missing. exit....  Stop.
make[1]: *** [/usr/src/linux-headers-6.7.12+bpo-common/Makefile:1936: /tmp/amd.DeGLA8su] Error 2
make: *** [/usr/src/linux-headers-6.7.12+bpo-common/Makefile:246: __sub-make] Error 2
make: Leaving directory '/usr/src/linux-headers-6.7.12+bpo-amd64'

@nartmada @ppanchad-amd Can we get an internal JIRA made and assign it to the KCL team? I feel like the resv->seq check isn't correct here. Even though this isn't a supported kernel, it's still an oversight. Hopefully they can figure out what's happening, even on an unsupported kernel . Thanks!

@kentrussell Internal JIRA ticket has been created to investigate this issue. Thanks!

@hankster112 Packaged driver on Debian is not supported. Closing ticket.

@hankster112 Packaged driver on Debian is not supported. Closing ticket.

You meant it's not supported and it will not be supported?

That it's not supported is already obvious.

commented

@vkomenda its supported via the packages in debain repo, there is nothing for amd to do here.

@vkomenda its supported via the packages in debain repo, there is nothing for amd to do here.

Can you please link this Debian repo. The installation instructions don't refer to any Debian repo. The amdgpu-install package I was downloading came from the Ubuntu repo.

commented

all the rocm componants are in the default debain repos where you would install anything from.
rocblas for instance: https://packages.debian.org/sid/librocblas0
its just apt install librocbas0 away

or rocm-smi:
https://packages.debian.org/sid/rocm-smi

@IMbackK what is missing in the debian repo? A fork of amdgpu-install? My main issue is that the GPU DKMS driver doesn't compile.

commented

Nothing is missing You dont need any amdgpu kernel driver besides the one that is already built into the upstream mainline kernel

commented

what gpu would that be?

what gpu would that be?

Radeon 780M

commented

Well aside from the fact that that is obviously not a supported rocm device and apus have some quirks with how memory managment works that makes them quite hard to use usefully with rocm atm, the mainline kernel absolutely should support full pm and all crts for that device. You are barking up the wrong tree, if one of those dose not work this is a kernel bug that you should file against the kernel on the mailing list (after trying the latest patch of linux 6.9 of course), not here.

Given the patch diff between mainline and rocm-dkms its also somewhat unlikely that it would behave any different anyhow.

Well aside from the fact that that is obviously not a supported rocm device and apus have some quirks with how memory managment works that makes them quite hard to use usefully with rocm atm, the mainline kernel absolutely should support full pm and all crts for that device. You are barking up the wrong tree, if one of those dose not work this is a kernel bug that you should file against the kernel on the mailing list (after trying the latest patch of linux 6.9 of course), not here.

Given the patch diff between mainline and rocm-dkms its also somewhat unlikely that it would behave any different anyhow.

Sorry, what did you say? I suggest you check the AMD driver installation instructions at https://www.amd.com/en/support/linux-drivers. There is no mention of the mainline kernel driver. It recommends amdgpu-install.

Of course it's no problem to report bugs to the mainline if there is no consistency within the AMD, sure.