rayon-rs / rayon

Rayon: A data parallelism library for Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Thread pool without work stealing

nhukc opened this issue · comments

commented

We are using a rayon thread pool to run a rust-based userspace driver. A mutex must be acquired before calls can be made to the driver.

We are running into a common deadlock problem with rayon work-stealing.
This issue occurs because the third party code that invokes our driver is also using rayon.

The deadlock results from the following events:

  1. Third party global thread pool is running N jobs
  2. Third party thread A invokes our library
  3. Third party thread A acquires our driver mutex
  4. Third party thread A creates a scope within our driver thread pool
  5. Third party thread A steals work
  6. Third party thread A invokes our library
  7. Third party thread A is waiting on a lock that it already holds.
  8. Deadlock

The rayon thread pool API fits our use-case perfectly, but the implementation detail of work-stealing causes deadlocks.

Is there an easy way to modify the thread pool internals to prevent work-stealing?

I'm not looking to merge with upstream unless this feature is wanted by others. I'd be OK with maintaining a non-work-stealing fork if a maintainer could give me some tips for how to do this.

commented

It may be worth noting that our driver thread pool executes exactly as many jobs as there are threads in the pool.

The general issue with Mutex is also described in #592.

One of the ideas in that thread is to have a fully-blocking version of ThreadPool::install, so cross-pool calls won't work-steal in the first one anymore. In your scenario, that should block "third party thread A" until your driver is done.

commented

Yes, that would solve my problem.

Is there any on-going work in that direction? Or tips for how to implement such a feature?

commented

It looks like the meat of the work would be in and around this function.

unsafe fn in_worker_cross<OP, R>(&self, current_thread: &WorkerThread, op: OP) -> R

Or rather, avoid that function in this case and call in_worker_cold instead.

commented

I implemented the suggestion in #1175. I did not call in_worker_cold because that function has a useful-looking debug assert that conflicts with this use case.