ROCm / ROCm

AMD ROCm™ Software - GitHub Home

Home Page:https://rocm.docs.amd.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ubuntu 24.04 support

erkinalp opened this issue · comments

Ubuntu 24.04 is already in its feature freeze. We need time to test, report and fix the bugs to be able to get them fixed right before the release, rather than long after.

Related to #2939

From the bug above:

This is actually a constant problem we are experiencing with ROCm and AMDGPU (yes, the official AMD driver and library stack for compute-oriented GPU), because they update their drivers right on the release day and do not leave us a time to test it.

Sounds like pre-release/beta versions of ROCm publicly available would help them too.

Now that 24.04 is released, any news on plans for 24.04 support?

This seems rather critical since Ubuntu 22.0.4 has a known issue with

[ 7813.309408] workqueue: ttm_bo_delayed_delete [amdttm] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND

Which is fixed in 23.0.4 (and thus 24.0.4)

https://gitlab.freedesktop.org/drm/amd/-/issues/2282#note_1901512

Right now it's pretty easy to crash any apps using the GPU doing any tensorflow or onyx on ubuntu because of this

Please it would be important for tools like pytorch, hashcat, blender, etc which needs gpu

commented

Any news?

With amdgpu-install_6.1.60101-1_all.deb on Ubuntu 24.04 (6.9.2 kernel):

$ amdgpu-install --usecase=rocm
[…]
The following packages have unmet dependencies:
 amd-smi-lib : Depends: python3-pip but it is not installable
               Recommends: python3-argcomplete but it is not installable
 hipsolver : Depends: libcholmod3 but it is not installable
             Depends: libsuitesparseconfig5 but it is not installable
 rocm-gdb : Depends: libtinfo5 but it is not installable
            Depends: libncurses5 but it is not installable
            Depends: libpython3.10 but it is not installable or
                     libpython3.8 but it is not installable
 rocm-llvm : Depends: libstdc++-5-dev but it is not installable or
                      libstdc++-7-dev but it is not installable or
                      libstdc++-11-dev but it is not installable
             Depends: libgcc-5-dev but it is not installable or
                      libgcc-7-dev but it is not installable or
                      libgcc-11-dev but it is not installable
             Recommends: gcc-multilib but it is not going to be installed
             Recommends: g++-multilib but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 hipsolver : Depends: libcholmod3 but it is not installable
             Depends: libsuitesparseconfig5 but it is not installable
 rocm-gdb : Depends: libtinfo5 but it is not installable
            Depends: libncurses5 but it is not installable
            Depends: libpython3.10 but it is not installable or
                     libpython3.8 but it is not installable
E: Unable to correct problems, you have held broken packages.

Same :(

Try https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/native-install/ubuntu.html or https://github.com/nktice/AMD-AI

This also helped get me to the next step. I now have rocm installed (thank you), but it doesn't seem to have library files for my older AMD GPU:

rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1032
 List of available TensileLibrary Files : 
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"

I'm trying to compile llama.cpp if that makes a difference.

@mikeperalta1
List your GPU here with rocminfo or clinfo so that I could see what is going on here.
If there is no luck with RoCM, I guess you have to try OpenCL route based on clBLAST, which is also a project I helped to tune for speed on llama.cpp.

By the way, I did encounter some hanging behaviour with my 7900XTX Ubuntu 24.04 while selecting any word in the editor of GNU Octave when the .m file is large (I don't know if this is the driver problem). In the meantime, I encountered a WebGPU problem with 7900XTX while no problem on 7800XT (two cards are in two different PCs with different motherboards x570/550 and the same CPU 5700G)

While it might work for some use cases to add the old PPA's, that is also a great way to brick your system and should not become an advised option at all. There's a good reason these PPA's aren't included as a fallback by default. Please take great caution before undertaking any of the sort.

@mikeperalta1 List your GPU here with rocminfo or clinfo so that I could see what is going on here. If there is no luck with RoCM, I guess you have to try OpenCL route based on clBLAST, which is also a project I helped to tune for speed on llama.cpp.

By the way, I did encounter some hanging behaviour with my 7900XTX Ubuntu 24.04 while selecting any word in the editor of GNU Octave when the .m file is large (I don't know if this is the driver problem). In the meantime, I encountered a WebGPU problem with 7900XTX while no problem on 7800XT (two cards are in two different PCs with different motherboards x570/550 and the same CPU 5700G)

I ended up using an ENV trick found elsewhere to get the program to run. Basically a var to force an earlier version that supports my card:

HSA_OVERRIDE_GFX_VERSION=10.3.0

But here's the info in case anyone is interested

*******                  
Agent 2                  
*******                  
  Name:                    gfx1032                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 6600 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      2048(0x800) KB                     
    L3:                      32768(0x8000) KB                   
  Chip ID:                 29695(0x73ff)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2900                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            32                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 118                                
  SDMA engine uCode::      76                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1032         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32

HSA_OVERRIDE_GFX_VERSION to override the device info to an officially supported one is normal for many of us. The instruction set in each generation should be the same, but the tuning may not be optimum for the chip with different caches and bandwidth. (Correct me: I did not count in registers or memory size for the former would be the same otherwise instruction would not pass while memory size is varying but the tuning profile seems to be one). Interestingly, based on my tuning experience for CLBLAST, 7800XT behaved similarly even when using the tuning profile for 7900XTX (But this is just the opensourced tuning method).

@erkinalp Is this ticket still relevant? If not, please close the ticket. Thanks!

Yes, still cannot install.

@ppanchad-amd 24.04 not listed in that page, and the APT repo also lacks the release definitions for 24.04.

@erkinalp Sorry, 24.04 is not officially supported yet in the lastest ROCm 6.1.2. Thanks!