rayon-rs / rayon

We are using a rayon thread pool to run a rust-based userspace driver. A mutex must be acquired before calls can be made to the driver.

We are running into a common deadlock problem with rayon work-stealing.
This issue occurs because the third party code that invokes our driver is also using rayon.

The deadlock results from the following events:

Third party global thread pool is running N jobs
Third party thread A invokes our library
Third party thread A acquires our driver mutex
Third party thread A creates a scope within our driver thread pool
Third party thread A steals work
Third party thread A invokes our library
Third party thread A is waiting on a lock that it already holds.
Deadlock

The rayon thread pool API fits our use-case perfectly, but the implementation detail of work-stealing causes deadlocks.

Is there an easy way to modify the thread pool internals to prevent work-stealing?

I'm not looking to merge with upstream unless this feature is wanted by others. I'd be OK with maintaining a non-work-stealing fork if a maintainer could give me some tips for how to do this.

It may be worth noting that our driver thread pool executes exactly as many jobs as there are threads in the pool.

The general issue with Mutex is also described in #592.

One of the ideas in that thread is to have a fully-blocking version of ThreadPool::install, so cross-pool calls won't work-steal in the first one anymore. In your scenario, that should block "third party thread A" until your driver is done.

Yes, that would solve my problem.

Is there any on-going work in that direction? Or tips for how to implement such a feature?

It looks like the meat of the work would be in and around this function.

rayon/rayon-core/src/registry.rs

Line 533 in 3e3962c

    
           unsafe fn in_worker_cross<OP, R>(&self, current_thread: &WorkerThread, op: OP) -> R

Or rather, avoid that function in this case and call in_worker_cold instead.

I implemented the suggestion in #1175. I did not call in_worker_cold because that function has a useful-looking debug assert that conflicts with this use case.

Thread pool without work stealing