Thread pool without work stealing
nhukc opened this issue · comments
We are using a rayon thread pool to run a rust-based userspace driver. A mutex must be acquired before calls can be made to the driver.
We are running into a common deadlock problem with rayon work-stealing.
This issue occurs because the third party code that invokes our driver is also using rayon.
The deadlock results from the following events:
- Third party global thread pool is running N jobs
- Third party thread A invokes our library
- Third party thread A acquires our driver mutex
- Third party thread A creates a scope within our driver thread pool
- Third party thread A steals work
- Third party thread A invokes our library
- Third party thread A is waiting on a lock that it already holds.
- Deadlock
The rayon thread pool API fits our use-case perfectly, but the implementation detail of work-stealing causes deadlocks.
Is there an easy way to modify the thread pool internals to prevent work-stealing?
I'm not looking to merge with upstream unless this feature is wanted by others. I'd be OK with maintaining a non-work-stealing fork if a maintainer could give me some tips for how to do this.
It may be worth noting that our driver thread pool executes exactly as many jobs as there are threads in the pool.
The general issue with Mutex
is also described in #592.
One of the ideas in that thread is to have a fully-blocking version of ThreadPool::install
, so cross-pool calls won't work-steal in the first one anymore. In your scenario, that should block "third party thread A" until your driver is done.
Yes, that would solve my problem.
Is there any on-going work in that direction? Or tips for how to implement such a feature?
It looks like the meat of the work would be in and around this function.
rayon/rayon-core/src/registry.rs
Line 533 in 3e3962c
Or rather, avoid that function in this case and call in_worker_cold
instead.