oracle / sd4j

Stable diffusion pipeline in Java using ONNX Runtime

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failed to load library libortextensions.so

johanjanssen opened this issue · comments

Hi, I wanted to try this awesome project and write an article about it, but I encountered an issue.

I compiled the onnxruntime-extensions on Linux and copied the library (actually everything, as it wasn't working) to the sd4j directory which contains your git repository:

sd4j$ ls -1
CONTRIBUTING.md
images
libnoexcep_operators.a
libocos_operators.a
libortcustomops.a
libortextensions.so
libortextensions.so.0
libortextensions.so.0.10.0
LICENSE.txt
pom.xml
README.md
SECURITY.md
src
target
text_tokenizer
THIRD_PARTY_LICENSES.txt

Then I run:

sd4j$ mvn package exec:exec -DmodelPath=../stable-diffusion-v1-5/

And I get the following error:

[INFO] --- exec-maven-plugin:3.1.0:exec (default-cli) @ sd4j ---
Gtk-Message: 21:05:49.827: Failed to load module "canberra-gtk-module"
Exception in thread "main" java.lang.IllegalStateException: Failed to instantiate SD4J pipeline
	at com.oracle.labs.mlrg.sd4j.SD4J.factory(SD4J.java:356)
	at com.oracle.labs.mlrg.sd4j.SD4JApp.<init>(SD4JApp.java:89)
	at com.oracle.labs.mlrg.sd4j.SD4JApp.main(SD4JApp.java:421)
Caused by: ai.onnxruntime.OrtException: Error code - ORT_FAIL - message: Failed to load library libortextensions.so with error: libortextensions.so: kan gedeeld objectbestand niet openen: Bestand of map bestaat niet
	at ai.onnxruntime.OrtSession$SessionOptions.registerCustomOpLibrary(Native Method)
	at ai.onnxruntime.OrtSession$SessionOptions.registerCustomOpLibrary(OrtSession.java:700)
	at com.oracle.labs.mlrg.sd4j.TextEmbedder.<init>(TextEmbedder.java:113)
	at com.oracle.labs.mlrg.sd4j.SD4J.factory(SD4J.java:340)
	... 2 more
[ERROR] Command execution failed.

I looked at the code and the TextEmbedder seems to not even load the libortextensions but ortextensions:

this.tokenizerOpts.registerCustomOpLibrary(System.mapLibraryName("ortextensions"));

Probably I'm doing something stupid here, but I couldn't quickly figure out what, so any help/tips would be really appreciated.

The mapLibraryName call does the right thing, you can see it pulled out libortextensions.so which is the correct form for Linux (e.g. on Windows it'll turn ortextensions into ortextensions.dll). How big is the libortextensions.so file (or whatever it's linked to)?

The other thing to check is that this currently targets ORT 1.14 and ORT-extensions 0.7.0 (because I forgot to put the ORT-extensions version number in the docs), and I think you compiled the head of ORT-extensions, so try compiling the v0.7.0 tag. I wouldn't expect that to result in a file not found (which is what Google translate said that error was, my Dutch is not great so if it means something else let me know), unless there were linker errors. Could you run ldd libortextensions.so and see if it found all the libraries?

I managed to generate an image using ORT 1.14 and the head of ORT-extensions main branch on an Ubuntu 22.04 desktop so it's not that. I get the following output for the libraries it needs:

$ ldd libortextensions.so 
	linux-vdso.so.1 (0x00007ffdb533e000)
	libgtk3-nocsd.so.0 => /lib/x86_64-linux-gnu/libgtk3-nocsd.so.0 (0x00007f012ea00000)
	libopenblas.so.0 => /opt/OpenBLAS/lib/libopenblas.so.0 (0x00007f012da00000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f012edbe000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f012d600000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f012ecd7000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f012ecb7000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f012d200000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f012f696000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f012ecb2000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f012ecad000)
	libgfortran.so.3 => /lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f012d8ce000)
	libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f012ec63000)
	libquadmath.so.0 => /lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f012ec19000)

Thanks for the quick reply, it looks indeed like I'm missing some libraries:

sd4j$ ldd libortextensions.so
	linux-vdso.so.1 (0x00007fffc951b000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f5eb5a9e000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f5eb4c00000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5eb59b5000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5eb5991000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5eb4800000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f5eb5acf000)

So I used v0.7.0:

onnxruntime-extensions$ git status
HEAD detached at v0.7.0
nothing to commit, working tree clean

And build it again:

./build_lib.sh --config Release --update --build --parallel

[100%] Linking CXX shared library lib/libortextensions.so
[100%] Built target extensions_shared
2023-12-19 23:02:23,960 util.run [DEBUG] - Subprocess completed. Return code: 0
2023-12-19 23:02:23,960 build [INFO] - Build complete

However it gives the same result:

onnxruntime-extensions/build/Linux/Release/lib$ ldd libortextensions.so 
	linux-vdso.so.1 (0x00007ffdaa7e7000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f2af5d44000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f2af5000000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2af5317000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f2af5d20000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2af4c00000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f2af5d75000)

I checked the output and it complains about some things it can't find (see below), but as shown above the build still succeeds.

2023-12-19 23:05:15,253 build [DEBUG] - Command line arguments:
  --config Release --update --build --parallel
2023-12-19 23:05:15,254 build [INFO] - Build started
2023-12-19 23:05:15,254 build [INFO] - Generating CMake build tree
2023-12-19 23:05:15,254 util.run [INFO] - Running subprocess in 'build/Linux/Release'
  /usr/bin/cmake /home/johan/InfoQ/SD4J/onnxruntime-extensions -DPython_EXECUTABLE=/usr/bin/python3 -DOCOS_ENABLE_SELECTED_OPLIST=OFF -G 'Unix Makefiles' -DCMAKE_BUILD_TYPE=Release
-- ONNX Runtime URL suffix: v1.10.0/onnxruntime-linux-x64-1.10.0.tgz
-- Fetch googlere2
-- Fetch opencv
-- Detected processor: x86_64
-- Looking for ccache - not found
-- libjpeg-turbo: VERSION = 2.1.0, BUILD = opencv-4.5.4-libjpeg-turbo
-- libva: missing va.h header (VA_INCLUDE_DIR)
-- Could not find OpenBLAS include. Turning OpenBLAS_FOUND off
-- Could not find OpenBLAS lib. Turning OpenBLAS_FOUND off
-- Could NOT find Atlas (missing: Atlas_CBLAS_INCLUDE_DIR Atlas_CLAPACK_INCLUDE_DIR Atlas_CBLAS_LIBRARY Atlas_BLAS_LIBRARY Atlas_LAPACK_LIBRARY) 
-- Could NOT find BLAS (missing: BLAS_LIBRARIES) 
-- Could NOT find LAPACK (missing: LAPACK_LIBRARIES) 
    Reason given by package: LAPACK could not be found because dependency BLAS could not be found.

-- VTK is not found. Please set -DVTK_DIR in CMake to VTK build directory, or to VTK install subdirectory with

Then I looked at the libraries that where missing from my libortextensions.so and installed some of the missing ones:

sudo apt-get install gtk3-nocsd
sudo apt-get install libopenblas64-dev

However that didn't seem to help:

onnxruntime-extensions/build/Linux/Release/lib$ ldd libortextensions.so 
	linux-vdso.so.1 (0x00007ffd755c3000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc181824000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc180a00000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc18173b000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc181717000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc180600000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fc181855000)

Do you know how I can get those libraries in the libortextensions.so?

The different set of libraries shouldn't make much difference, it's more to check that none of them say <foo> => not found as that would cause a linking error. We only use the tokenizer from onnxruntime-extensions so it doesn't matter that yours isn't linking against BLAS or other math stuff like mine did.

This then suggests that it's some kind of path issue, which is weird because it worked fine on my Ubuntu box earlier. As you're running the exec in Maven then it shouldn't be a working directory issue (as it looks for that library in the working directory), so maybe there's something odd in the path resolution logic inside ONNX Runtime. Could you try adding the absolute path into line 113 in TextEmbedder (e.g. this.tokenizerOpts.registerCustomOpLibrary("/<path>/<from>/<root>/<to>/sd4j/"+System.mapLibraryName("ortextensions"));)?

ORT's library loader bottoms out here - https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/platform/posix/env.cc#L537, and it might be that I should modify ORT's Java code to convert that String into a path, get the absolute path and then pass it in. Either that or prepend ./ to it on Linux as it seems like dlopen might not always search the working directory first.

If the absolute path works for you then could you also try using "./"+System.mapLibraryName("ortextensions") and see if that works?

Thanks a lot for the pointers, I managed to get it working on my machine. Apparently it prepends "lib" to "ortexensions", but somehow that doesn't work on my machine. I managed to get it working by placing the libortexensions.so file in the sd4j/lib folder and then adding the bold part of the code as follows:

this.tokenizerOpts.registerCustomOpLibrary(System.mapLibraryName("/libortextensions"));

Indeed the workaround you proposed also works:

this.tokenizerOpts.registerCustomOpLibrary("./"+System.mapLibraryName("ortextensions"));

Ok, I'll check that that behaves itself on macOS and Windows then update the code.

Great, thanks for the quick support! You can expect an article about it in the near future on InfoQ

great, I think we can close this issue now?

Yes it now works on my machine.

I checked on Windows, macOS and my Linux box and the fix didn't affect anything on those systems, so I've pushed it to main.