pykeio / ort

Fast ML inference & training for Rust with ONNX Runtime

Home Page:https://ort.pyke.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Win11x64 STATUS_ACCESS_VIOLATION within OrtGetApiBase

davehorner opened this issue · comments

I'm not sure what to think. I am able to use the library without issue when I use the http server of the application. When I try to use it via the directory, it gives me a (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION) error.

I have used ort successfully in release and debug. I have tried the load-dynamic feature. I've checked all the dependencies look OK for ortruntime and it works. but there are cases where it STATUS_ACCESS_VIOLATIONs.

Exception 0xc0000005 encountered at address 0x7ffbd9b95253: Access violation reading location 0x7ffbad5ed490

OrtGetApiBase (@OrtGetApiBase:249)
OrtSessionOptionsAppendExecutionProvider_CUDA (@OrtSessionOptionsAppendExecutionProvider_CUDA:24955)
OrtSessionOptionsAppendExecutionProvider_CUDA (@OrtSessionOptionsAppendExecutionProvider_CUDA:24955)
void ort::session::impl$5::drop(struct ort::session::SessionPointerHolder *) (c:\Users\dhorner\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.16.3\src\session.rs:550)
void core::ptr::drop_in_place<ort::session::SessionPointerHolder>(struct ort::session::SessionPointerHolder *) (@core::ptr::drop_in_place<ort::session::SessionPointerHolder>:6)
void alloc::sync::Arc<ort::session::SessionPointerHolder,alloc::alloc::Global>::drop_slow<ort::session::SessionPointerHolder,alloc::alloc::Global>() (@alloc::sync::Arc<T,A>::drop_slow:10)
void alloc::sync::impl$33::drop<ort::session::SessionPointerHolder,alloc::alloc::Global>(struct alloc::sync::Arc<ort::session::SessionPointerHolder,alloc::alloc::Global> *) (@<alloc::sync::Arc<T,A> as core::ops::drop::Drop>::drop:23)
void core::ptr::drop_in_place<alloc::sync::Arc<ort::session::SessionPointerHolder,alloc::alloc::Global> >(struct alloc::sync::Arc<ort::session::SessionPointerHolder,alloc::alloc::Global> *) (@core::ptr::drop_in_place<alloc::sync::Arc<ort::session::SessionPointerHolder>>:6)
void core::ptr::drop_in_place<ort::session::Session>(struct ort::session::Session *) (@core::ptr::drop_in_place<ort::session::Session>:17)
union enum2$<core::task::poll::Poll<enum2$<core::result::Result<tuple$<>,anyhow::Error> > > > rust_background_removal::main::async_block$0(struct core::pin::Pin<ref_mut$<enum2$<rust_background_removal::main::async_block_env$0> > >, struct core::task::wake::Context *) (c:\w\rust-background-removal\src\main.rs:200)
static union enum2$<core::task::poll::Poll<enum2$<core::result::Result<tuple$<>,anyhow::Error> > > > tokio::runtime::park::impl$4::block_on::closure$0<enum2$<rust_background_removal::main::async_block_env$0> >(struct tokio::runtime::park::impl$4::block_on::closure_env$0<enum2$<rust_background_removal::main::async_block_env$0> >) (c:\Users\dhorner\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\park.rs:282)
union enum2$<core::result::Result<enum2$<core::result::Result<tuple$<>,anyhow::Error> >,std::thread::local::AccessError> > tokio::runtime::park::CachedParkThread::block_on<enum2$<rust_background_removal::main::async_block_env$0> >(union enum2$<rust_background_removal::main::async_block_env$0>) (c:\Users\dhorner\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\park.rs:282)
union enum2$<core::result::Result<enum2$<core::result::Result<tuple$<>,anyhow::Error> >,std::thread::local::AccessError> > tokio::runtime::context::blocking::BlockingRegionGuard::block_on<enum2$<rust_background_removal::main::async_block_env$0> >(union enum2$<rust_background_removal::main::async_block_env$0>) (c:\Users\dhorner\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\context\blocking.rs:66)
tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}} (c:\Users\dhorner\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\scheduler\multi_thread\mod.rs:87)
union enum2$<core::result::Result<tuple$<>,anyhow::Error> > tokio::runtime::context::runtime::enter_runtime<tokio::runtime::scheduler::multi_thread::impl$0::block_on::closure_env$0<enum2$<rust_background_removal::main::async_block_env$0> >,enum2$<core::result::Result<tuple$<>,anyhow::Error> > >(union enum2$<tokio::runtime::scheduler::Handle> *, bool, struct tokio::runtime::scheduler::multi_thread::impl$0::block_on::closure_env$0<enum2$<rust_background_removal::main::async_block_env$0> >, struct core::panic::location::Location *) (c:\Users\dhorner\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\context\runtime.rs:65)
union enum2$<core::result::Result<tuple$<>,anyhow::Error> > tokio::runtime::scheduler::multi_thread::MultiThread::block_on<enum2$<rust_background_removal::main::async_block_env$0> >(union enum2$<tokio::runtime::scheduler::Handle> *, union enum2$<rust_background_removal::main::async_block_env$0>) (c:\Users\dhorner\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\scheduler\multi_thread\mod.rs:86)
union enum2$<core::result::Result<tuple$<>,anyhow::Error> > tokio::runtime::runtime::Runtime::block_on<enum2$<rust_background_removal::main::async_block_env$0> >(union enum2$<rust_background_removal::main::async_block_env$0>, struct core::panic::location::Location *) (c:\Users\dhorner\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\runtime.rs:350)
static union enum2$<core::result::Result<tuple$<>,anyhow::Error> > rust_background_removal::main() (c:\w\rust-background-removal\src\main.rs:131)
union enum2$<core::result::Result<tuple$<>,anyhow::Error> > core::ops::function::FnOnce::call_once<enum2$<core::result::Result<tuple$<>,anyhow::Error> > (*)(),tuple$<> >( *) (@core::ops::function::FnOnce::call_once:6)
union enum2$<core::result::Result<tuple$<>,anyhow::Error> > std::sys_common::backtrace::__rust_begin_short_backtrace<enum2$<core::result::Result<tuple$<>,anyhow::Error> > (*)(),enum2$<core::result::Result<tuple$<>,anyhow::Error> > >( *) (@std::sys_common::backtrace::__rust_begin_short_backtrace:6)

https://github.com/davehorner/rust-background-removal is the code if that helps. I know not much to go on, but I did try and capture as much detail as possible.

cargo run with that repo is working fine for me. How exactly are you using it?

The --http option runs a server and that works fine. Just running it with no options causes it to do an access violation. --help works fine.
I can debug and it gets to if !args.input_file.is_empty() { the next step hops to the attribute on main

#[tokio::main(worker_threads = 10)]
// #[tokio::main(flavor = "current_thread")]
//#[tokio::main(worker_threads = 1)]

which I had set to 10. I just now tried 1 and current_thread and that did not resolve it.

When I debug, it looks like it is actually doing the access violation after the main return Ok(())! Which might explain why the http server works fine...

I cannot reproduce this with/without CUDA, with/without --http, or with any of the configurations of tokio::main you mentioned.

Could you set the environment variable set RUST_LOG="ort=debug"? Maybe that will give some info as to what's happening.

thanks for trying! This is after set RUST_LOG=ort=debug and running with no options.

2023-12-23T04:01:46.920168Z DEBUG ort::environment: Environment not yet initialized, creating a new one
2023-12-23T04:01:46.945897Z DEBUG ort::environment: Environment created env_ptr="0x12d4c5a3ee0"
2023-12-23T04:01:47.195056Z  INFO apply_execution_providers: ort::execution_providers: Successfully registered `CUDAExecutionProvider`
2023-12-23T04:01:47.728918Z DEBUG drop{self=Environment { env: Mutex { data: EnvironmentSingleton { name: "default", env_ptr: 0x12d4c5a3ee0 }, poisoned: false, .. }, execution_providers: [] }}: ort::environment: Dropping environment global_arc_count=2
2023-12-23T04:01:47.729263Z DEBUG drop{self=Environment { env: Mutex { data: EnvironmentSingleton { name: "default", env_ptr: 0x12d4c5a3ee0 }, poisoned: false, .. }, execution_providers: [] }}: ort::environment: Releasing environment global_arc_count=2
error: process didn't exit successfully: `target\release\rust-background-removal.exe` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)

The environment being dropped prematurely is either the culprit or a symptom. I don't know why I'm not seeing that on my end, though.

Does it still crash if you remove ort's cuda feature from the Cargo.toml?

    Finished release [optimized] target(s) in 31.45s
     Running `target\release\rust-background-removal.exe`
2023-12-23T04:08:35.594557Z DEBUG ort::environment: Environment not yet initialized, creating a new one
2023-12-23T04:08:35.620018Z DEBUG ort::environment: Environment created env_ptr="0x1cfc99c8910"
2023-12-23T04:08:35.620428Z DEBUG apply_execution_providers: ort::execution_providers: Execution provider `CUDAExecutionProvider` was not registered because its corresponding Cargo feature is disabled.
2023-12-23T04:08:35.620818Z DEBUG apply_execution_providers: ort::execution_providers: Execution provider `CoreMLExecutionProvider` was not registered because its corresponding Cargo feature is disabled.
2023-12-23T04:08:35.621495Z  INFO apply_execution_providers: ort::execution_providers: Successfully registered `CPUExecutionProvider`
2023-12-23T04:08:35.958028Z DEBUG drop{self=Environment { env: Mutex { data: EnvironmentSingleton { name: "default", env_ptr: 0x1cfc99c8910 }, poisoned: false, .. }, execution_providers: [] }}: ort::environment: Dropping environment global_arc_count=2
2023-12-23T04:08:35.958323Z DEBUG drop{self=Environment { env: Mutex { data: EnvironmentSingleton { name: "default", env_ptr: 0x1cfc99c8910 }, poisoned: false, .. }, execution_providers: [] }}: ort::environment: Releasing environment global_arc_count=2
Error: The system cannot find the path specified. (os error 3)
error: process didn't exit successfully: `target\release\rust-background-removal.exe` (exit code: 1)

it seems cuda is the culprit.

Right, I remember running into something similar with CUDA in the past. It could be due to missing zlibwapi.dll or cuDNN, or you have CUDA 12.x installed.

was careful to install 11 only on this actually. zlibwapi.dll would be a dependency of what? it loads it and works fine in http, it seems to be tear down is off. I will play here some more, now that its running under cpu its not working as I would expect... oh fun.

zlibwapi.dll would be a dependency of what?

CUDA, for whatever godforsaken reason 😅

it loads it and works fine in http

In that case, could it be due to threading? Sessions registered with CUDA really don't like being sent across threads and can cause a STATUS_ACCESS_VIOLATION like this. Though, again, I don't know why I can't reproduce it, and it seems like if that were the case it should be the other way around (normal without, crashing with HTTP)

Well I don't think I have zlibwapi.dll available on the PATH. It's a wild goose chase trying to figure out what I should use for that file. I would expect to see errors about it specifically during the load and not during unload.

2023-12-23T15:05:18.542847Z DEBUG ort::environment: Environment not yet initialized, creating a new one
2023-12-23T15:05:18.567392Z DEBUG ort::environment: Environment created env_ptr="0x152d92ff920"
2023-12-23T15:05:18.877945Z  INFO apply_execution_providers: ort::execution_providers: Successfully registered `CUDAExecutionProvider`
2023-12-23T15:05:19.458356Z DEBUG drop{self=Environment { env: Mutex { data: EnvironmentSingleton { name: "default", env_ptr: 0x152d92ff920 }, poisoned: false, .. }, execution_providers: [] }}: ort::environment: Dropping environment global_arc_count=2
2023-12-23T15:05:19.458628Z DEBUG drop{self=Environment { env: Mutex { data: EnvironmentSingleton { name: "default", env_ptr: 0x152d92ff920 }, poisoned: false, .. }, execution_providers: [] }}: ort::environment: Releasing environment global_arc_count=2
error: process didn't exit successfully: `target\release\rust-background-removal.exe -I """` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)

#[tokio::main] operates in the same manner, I thought it was the threads but I just don't know.
We can close this if you want, I don't know what else to do with it, just know its occurring.
It does work though and thank you for the software.