ROCm / ROCm

AMD ROCm™ Software - GitHub Home

Home Page:https://rocm.docs.amd.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Issue] ROCm 6.1 - Ubuntu package for hipblas-dev should run ldconfig after install

linuxtek-canada opened this issue · comments

Problem Description

When installing hipblas-dev on Ubuntu using apt, the package installation should run ldconfig before completing.

We are testing building LocalAI using ROCm 6.1 based on an Ubuntu 22.04 image: rocm/dev-ubuntu-22.04:6.1.

Our build succeeds, but the execution fails with this error:

localai-1  | ./local-ai: error while loading shared libraries: libhipblas.so.2: cannot open shared object file: No such file or directory

We worked around the issue in our Dockerfile by manually calling ldconfig after the package installations, and were able to get everything running successfully.

This is due to the hipblas-dev package installation not running ldconfig, which is the standard policy when installing shared libraries. Any AMD built deb package for Debian/Ubuntu should follow this policy:

Any such package must have the line activate-noawait ldconfig in its triggers control file (i.e. DEBIAN/triggers).
  • /opt/rocm-6.1.0/lib isn't listed in /etc/ld.so.conf, it's in /etc/ld.so.conf.d. However:
  • If a file is added to /etc/ld.so.conf.d/ and that location is included from /etc/ld.so.conf, it should follow this policy.
  • /etc/ld.so.conf only has one line, and that is include /etc/ld.so.conf.d/*.conf, so anything in /etc/ld.so.conf.d/ is in /etc/ld.so.conf.
  • If this is done as part of the package installation, it may not be needed as part of post-installation instructions.

Operating System

Ubuntu 22.04 Jammy

CPU

AMD Ryzen 7 7800X3D 8-Core Processor

GPU

AMD Radeon RX 7900 XT

ROCm Version

ROCm 6.1.0

ROCm Component

hipBLAS

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Digging into this a bit more to confirm the package is missing the trigger, and so I can understand:

Example - libc6 package

I downloaded the .deb for libc6 which also depends on a number of libraries, and adds new library files. I extracted the control files, including the triggers:

apt-get download libc6
ar vx libc6_2.35-0ubuntu3.7_amd64.deb
mkdir control && tar -I zstd -xvf control.tar.zst -C control
ls -al  ./control

In the control directory, you can see the triggers file:

drwxr-xr-x. 1 root root    148 Apr 16 13:40 .
drwxr-xr-x. 1 root root    180 May  3 15:51 ..
-rw-r--r--. 1 root root     40 Apr 16 13:40 conffiles
-rw-r--r--. 1 root root   1353 Apr 16 13:40 control
-rw-r--r--. 1 root root  21606 Apr 16 13:40 md5sums
-rwxr-xr-x. 1 root root   7576 Apr 16 13:40 postinst
-rwxr-xr-x. 1 root root   1067 Apr 16 13:40 postrm
-rwxr-xr-x. 1 root root  17402 Apr 16 13:40 preinst
-rw-r--r--. 1 root root    930 Apr 16 13:40 shlibs
-rw-r--r--. 1 root root 151495 Apr 16 13:40 symbols
-rw-r--r--. 1 root root  85281 Apr 16 13:40 templates
-rw-r--r--. 1 root root     72 Apr 16 13:40 triggers

Which contains the expected command to run ldconfig:

# Triggers added by dh_makeshlibs/13.6ubuntu1
activate-noawait ldconfig

Example - hipblas-dev

Comparing this to the hipblas-dev package, I ran these commands to download the .deb file, extract the control files and examine:

apt-get download hipblas-dev
ar vx hipblas-dev_2.1.0.60100-82~22.04_amd64.deb
mkdir control && tar -xvf control.tar.gz -C control
ls -al ./control

This is what I see:

drwxr-xr-x. 1 root root  54 May  3 15:53 .
drwxr-xr-x. 1 root root 202 May  3 15:53 ..
-rw-r--r--. 1 root root 279 Apr 12 04:46 control
-rw-r--r--. 1 root root 694 Apr 12 04:46 md5sums
-rw-r--r--. 1 root root   0 Apr 12 04:46 postinst
-rw-r--r--. 1 root root   0 Apr 12 04:46 prerm

So we are missing the expected triggers at the very least.

@linuxtek-canada Internal ticket has been created to investigate this issue. Thanks!