rayon-rs / rayon

Rayon: A data parallelism library for Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Slowdown when using deeply nested vector

arnabanimesh opened this issue · comments

OS: Windows 11 23H2 22631.3296 (64 bit)
Rust version: 1.77.1
Rayon version: 1.10.0

Minimum reproducible example elaborating the issue:

use rayon::prelude::*;

#[allow(dead_code)]
#[derive(Clone)]
struct DummyStruct {
    x: i32,
}

fn f() -> Vec<DummyStruct> {
    vec![]
}

fn parentf() -> Vec<DummyStruct> {
    // Works properly on a simple vector
    // let _: Vec<Vec<Vec<Vec<Vec<DummyStruct>>>>> = Vec::new();
    // Slows down considerably on a level 4 nested vec with a struct
    let _: Vec<Vec<Vec<Vec<Vec<DummyStruct>>>>> =
        vec![vec![vec![vec![Vec::new(); 1000]; 1000]; 2]; 2];
    f()
}

fn solve(idx: usize) {
    parentf();
    // To check where it is slowing down
    println!("{}",idx);
}

fn main() {
    (0..800).collect::<Vec<usize>>()
        .par_iter()
        .for_each(|&idx| solve(idx));
}

Similar bug reported to tokio too: tokio-rs/tokio#6458

Note that rayon is not involved in that deeply-nested Vec, apart from getting you into that multi-threaded context in the first place. You should use a profiler, but I expect you'll see that most of your time is either in alloc/dealloc or the Drop for Vec itself (at multiple levels).

@cuviper I also think that alloc and dealloc might be the issue, but I didn't think that creating nested vectors would have this much overhead. I will check using profiler though.

Check the code now, I have simplified it further. It turns out recursion was not the problem.

On WSL2 Ubuntu 22.04 it runs fine, but I can't generate flamegraph (Used cargo-flamegraph and inferno) from perf.data due to IO/CPU overload and out of order events. I think there is some issue with how Windows allocates data or the Rust binary generated on Windows.

Posted the issue in rust repo: rust-lang/rust#123447

Turns out heapalloc was the culprit as mentioned in the issue I posted in Rust Lang repo.