rocm support
sonicrules1234 opened this issue · comments
I'd be willing to help with the testing for this
Just added ExecutionProvider::rocm()
in b02744a. To use ROCm you'll have to build ONNX Runtime from source and point ort
to the compiled libraries with ORT_STRATEGY=system
.
Building onnxruntime with rocm doesn't support making a shared library, so when I compile, it's giving me "/usr/bin/ld: cannot find -lonnxruntime: No such file or directory", even if I add to LD_LIBRARY_PATH.
Is this an error when compiling ort
or ONNX Runtime itself?
ort
Ah, when using ORT_STRATEGY=system
with static libraries you need to set ORT_LIB_LOCATION
to the library path, e.g. ORT_LIB_LOCATION=~/onnxruntime/build/Release/
Hm, looks like ort is compiling fine, but using it with your diffusers library is when it has the linking error, since there is no libonnxruntime.a either.
Here's a list of files and directories the rocm version of onnxruntime produced: https://gist.github.com/45585c5f7bd797bbe9c6e3998edf0b34
Looks like I did not properly implement static linking with ORT_STRATEGY=system
. 5044e45 should fix (most of) the linking issues.
Now I'm getting
error: could not find native static library protobuf-lited
, perhaps an -L flag is missing?
error: could not compile ort
due to previous error
I fixed the ort typo I mentioned last comment, which got ort compiling again, but I got https://gist.github.com/f0de93ba9fe3f0639a46d295b6f1e993 when compiling my program that uses your diffusers library.
https://gist.github.com/d081fd8ecb2812aaa5fe1795129f183b
I think it gave the same error but with the capital M this time
OrtSessionOptionsAppendExecutionProvider_ROCM
is defined in onnxruntime/core/session/provider_bridge_ort.cc
, which is built in libonnxruntime_session.a
, which is linked by ort here, so I'm not sure why it can't find the symbol still. Maybe it's defined in another library that I can't see.
@sonicrules1234 what branch of microsoft/onnxruntime is checked out, and would it be possible to share the contents of the build directory?
Sorry it took so long to respond, I didn't see any email notification for this. Right now the files are on a somewhat corrupted partition. Once my pc is working properly again, I'll give this a try once more.
Okay, got it back to where it was on a different install. I'm using the main branch of onnxruntime, and pulled it out of the docker build.
https://github.com/microsoft/onnxruntime/tree/main/dockerfiles#rocm
The folder is 2.2G in size. If you have a place I can upload to, I can upload it.
OrtSessionOptionsAppendExecutionProvider_ROCM
has apparently been broken for a long time, so I pushed 8889616 using a newer API. Linking should finally be fixed, please let me know how testing goes 😁
Ok, it compiles and runs now, but doesn't seem to be using the gpu:
Does it output anything when running with the environment variable RUST_LOG=ort=debug
?
Nope
Add
[dependencies]
tracing = "0.1"
tracing-subscriber = "0.3"
to your Cargo.toml, and at the top of fn main()
, add
fn main() {
tracing_subscriber::fmt::init();
...
then run again to see logs.
It's spamming
2023-03-05T22:29:09.229375Z DEBUG apply_execution_providers: ort::execution_providers: ROCm execution provider registration Err(Msg("/code/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1058 void onnxruntime::ProviderSharedLibrary::Ensure() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_shared.so with error: libonnxruntime_providers_shared.so: cannot open shared object file: No such file or directory\n"))
2023-03-05T22:29:09.353019Z DEBUG new{allocator=Device memory_type=Default}: ort::memory: Creating new OrtMemoryInfo.
2023-03-05T22:29:09.353052Z DEBUG drop{self=SessionBuilder { env: "default", allocator: Device, memory_type: Default }}: ort::session: Dropping the session options.
That shared library file does exist at the root of ORT_LIB_LOCATION
Try running with the env var LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORT_LIB_LOCATION
Doesn't seem to make a difference
Hello! I tried myself to run it, I have encountered the above error of
libonnxruntime_providers_shared.so: cannot open shared object file: No such file or directory
Which was fixed by adding it to LD_LIBRARY_PATH
FROM rocm/dev-ubuntu-22.04:5.6-complete
ARG ONNXRUNTIME_REPO=https://github.com/Microsoft/onnxruntime
ARG ONNXRUNTIME_BRANCH=main
WORKDIR /code
ENV PATH /opt/miniconda/bin:/code/cmake-3.26.3-linux-x86_64/bin:${PATH}
RUN git clone --single-branch --branch ${ONNXRUNTIME_BRANCH} --recursive ${ONNXRUNTIME_REPO} onnxruntime &&\
/bin/sh onnxruntime/dockerfiles/scripts/install_common_deps.sh &&\
cd onnxruntime &&\
/bin/sh ./build.sh --allow_running_as_root --config Release --update --build --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --use_rocm --rocm_home=/opt/rocm \
# Modified from this point
--skip_submodule_sync --skip_tests --build_shared_lib &&\
cd build/Linux/Release/ &&\
make install
ENV ONNXRUNTIME_DIR="/code/onnxruntime/build/Linux/Release/"
ENV LD_LIBRARY_PATH=$ONNXRUNTIME_DIR:$LD_LIBRARY_PATH
# Next I copy and run a Rust application
But now I get these logs:
2023-08-01T10:56:25.642746Z INFO apply_execution_providers: ort::execution_providers: Successfully registered `ROCmExecutionProvider`
Segmentation fault (core dumped)
It is running inside Docker. Perhaps this is because I'm using ROCm 5.6?
How can I debug this segmentation fault?
EDIT: Just wasted 30 minutes, it doesn't work with ROCm 5.4 either.