pykeio / ort

A Rust wrapper for ONNX Runtime

Home Page:https://ort.pyke.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

rocm support

sonicrules1234 opened this issue · comments

I'd be willing to help with the testing for this

Just added ExecutionProvider::rocm() in b02744a. To use ROCm you'll have to build ONNX Runtime from source and point ort to the compiled libraries with ORT_STRATEGY=system.

Building onnxruntime with rocm doesn't support making a shared library, so when I compile, it's giving me "/usr/bin/ld: cannot find -lonnxruntime: No such file or directory", even if I add to LD_LIBRARY_PATH.

Is this an error when compiling ort or ONNX Runtime itself?

ort

Ah, when using ORT_STRATEGY=system with static libraries you need to set ORT_LIB_LOCATION to the library path, e.g. ORT_LIB_LOCATION=~/onnxruntime/build/Release/

Hm, looks like ort is compiling fine, but using it with your diffusers library is when it has the linking error, since there is no libonnxruntime.a either.

Here's a list of files and directories the rocm version of onnxruntime produced: https://gist.github.com/45585c5f7bd797bbe9c6e3998edf0b34

Looks like I did not properly implement static linking with ORT_STRATEGY=system. 5044e45 should fix (most of) the linking issues.

Now I'm getting
error: could not find native static library protobuf-lited, perhaps an -L flag is missing?

error: could not compile ort due to previous error

sorry for the delay. Linking should be fixed with 2364c5d.

I fixed the ort typo I mentioned last comment, which got ort compiling again, but I got https://gist.github.com/f0de93ba9fe3f0639a46d295b6f1e993 when compiling my program that uses your diffusers library.

Ah, looks like the ONNX docs were wrong. Can you change OrtSessionOptionsAppendExecutionProvider_ROCm in src/execution_providers.rs on line 18 and line 221 to OrtSessionOptionsAppendExecutionProvider_ROCM (capital M) and try again?

https://gist.github.com/d081fd8ecb2812aaa5fe1795129f183b

I think it gave the same error but with the capital M this time

OrtSessionOptionsAppendExecutionProvider_ROCM is defined in onnxruntime/core/session/provider_bridge_ort.cc, which is built in libonnxruntime_session.a, which is linked by ort here, so I'm not sure why it can't find the symbol still. Maybe it's defined in another library that I can't see.

@sonicrules1234 what branch of microsoft/onnxruntime is checked out, and would it be possible to share the contents of the build directory?

Sorry it took so long to respond, I didn't see any email notification for this. Right now the files are on a somewhat corrupted partition. Once my pc is working properly again, I'll give this a try once more.

Okay, got it back to where it was on a different install. I'm using the main branch of onnxruntime, and pulled it out of the docker build.
https://github.com/microsoft/onnxruntime/tree/main/dockerfiles#rocm
The folder is 2.2G in size. If you have a place I can upload to, I can upload it.

OrtSessionOptionsAppendExecutionProvider_ROCM has apparently been broken for a long time, so I pushed 8889616 using a newer API. Linking should finally be fixed, please let me know how testing goes 😁

Ok, it compiles and runs now, but doesn't seem to be using the gpu:

https://gist.github.com/6e435a1caf2290c9eec768e2793a36eb

Does it output anything when running with the environment variable RUST_LOG=ort=debug?

Nope

Add

[dependencies]
tracing = "0.1"
tracing-subscriber = "0.3"

to your Cargo.toml, and at the top of fn main(), add

fn main() {
    tracing_subscriber::fmt::init();

    ...

then run again to see logs.

It's spamming

2023-03-05T22:29:09.229375Z DEBUG apply_execution_providers: ort::execution_providers: ROCm execution provider registration Err(Msg("/code/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1058 void onnxruntime::ProviderSharedLibrary::Ensure() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_shared.so with error: libonnxruntime_providers_shared.so: cannot open shared object file: No such file or directory\n"))
2023-03-05T22:29:09.353019Z DEBUG new{allocator=Device memory_type=Default}: ort::memory: Creating new OrtMemoryInfo.
2023-03-05T22:29:09.353052Z DEBUG drop{self=SessionBuilder { env: "default", allocator: Device, memory_type: Default }}: ort::session: Dropping the session options.

That shared library file does exist at the root of ORT_LIB_LOCATION

Try running with the env var LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORT_LIB_LOCATION

Doesn't seem to make a difference

commented

Hello! I tried myself to run it, I have encountered the above error of

libonnxruntime_providers_shared.so: cannot open shared object file: No such file or directory

Which was fixed by adding it to LD_LIBRARY_PATH

FROM rocm/dev-ubuntu-22.04:5.6-complete

ARG ONNXRUNTIME_REPO=https://github.com/Microsoft/onnxruntime
ARG ONNXRUNTIME_BRANCH=main

WORKDIR /code

ENV PATH /opt/miniconda/bin:/code/cmake-3.26.3-linux-x86_64/bin:${PATH}

RUN git clone --single-branch --branch ${ONNXRUNTIME_BRANCH} --recursive ${ONNXRUNTIME_REPO} onnxruntime &&\
    /bin/sh onnxruntime/dockerfiles/scripts/install_common_deps.sh &&\
    cd onnxruntime &&\
    /bin/sh ./build.sh --allow_running_as_root --config Release --update --build --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --use_rocm --rocm_home=/opt/rocm \
    # Modified from this point
    --skip_submodule_sync  --skip_tests --build_shared_lib &&\
    cd build/Linux/Release/ &&\
    make install

ENV ONNXRUNTIME_DIR="/code/onnxruntime/build/Linux/Release/"
ENV LD_LIBRARY_PATH=$ONNXRUNTIME_DIR:$LD_LIBRARY_PATH

# Next I copy and run a Rust application

But now I get these logs:

2023-08-01T10:56:25.642746Z  INFO apply_execution_providers: ort::execution_providers: Successfully registered `ROCmExecutionProvider`
Segmentation fault (core dumped)

It is running inside Docker. Perhaps this is because I'm using ROCm 5.6?
How can I debug this segmentation fault?

EDIT: Just wasted 30 minutes, it doesn't work with ROCm 5.4 either.