rayon-rs / rayon

Rayon: A data parallelism library for Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using async iterator-like SQLX fetch with Rayon

nyurik opened this issue · comments

Crossposting with stackoverflow:

Rust SQLX lib provides an iterator-like interface fetch(...) usually used with while let Some(row) = rows.try_next().await? {...} construct. In my case, each row may take some time to process, so I would like to use Rayon's par_iter(), but that requires a real iterator. Using fetch_all is not an option because all rows may simply not fit into memory.

How can I use Rayon's par_iter to process millions of rows produced by the sqlx's fetch stream?

You may have an easier time if you just use an async threadpool, like tokio provides.

What sort of results come out of your parallel process? The shape of that will affect how you might shoehorn this into rayon.

Thx @cuviper , each operation is essentially a bool -- in a way, I am trying to parallel-ize validation of a large dataset, searching for any row that do not pass validation. So ordering is not important, but would be good to abort early if any validation fails.

One way you could try is with tokio channels, using async send and sync blocking_recv something like:

std::iter::from_fn(move || rx.blocking_recv()).par_bridge() //...

If you are using sqlx and therefore already run within an async runtime providing a thread pool, I also suspect that just chunking the rows using something like StreamExt::chunks and spawning a task for each chunk would be preferable to trying to force this into a parallel iterator. Alternatively, spawning a task per hardware thread and using an async mpmc channel like flume or kanal might work as well.