Slowdown when using deeply nested vector
arnabanimesh opened this issue · comments
OS: Windows 11 23H2 22631.3296 (64 bit)
Rust version: 1.77.1
Rayon version: 1.10.0
Minimum reproducible example elaborating the issue:
use rayon::prelude::*;
#[allow(dead_code)]
#[derive(Clone)]
struct DummyStruct {
x: i32,
}
fn f() -> Vec<DummyStruct> {
vec![]
}
fn parentf() -> Vec<DummyStruct> {
// Works properly on a simple vector
// let _: Vec<Vec<Vec<Vec<Vec<DummyStruct>>>>> = Vec::new();
// Slows down considerably on a level 4 nested vec with a struct
let _: Vec<Vec<Vec<Vec<Vec<DummyStruct>>>>> =
vec![vec![vec![vec![Vec::new(); 1000]; 1000]; 2]; 2];
f()
}
fn solve(idx: usize) {
parentf();
// To check where it is slowing down
println!("{}",idx);
}
fn main() {
(0..800).collect::<Vec<usize>>()
.par_iter()
.for_each(|&idx| solve(idx));
}
Similar bug reported to tokio
too: tokio-rs/tokio#6458
Note that rayon is not involved in that deeply-nested Vec
, apart from getting you into that multi-threaded context in the first place. You should use a profiler, but I expect you'll see that most of your time is either in alloc/dealloc or the Drop for Vec
itself (at multiple levels).
@cuviper I also think that alloc and dealloc might be the issue, but I didn't think that creating nested vectors would have this much overhead. I will check using profiler though.
Check the code now, I have simplified it further. It turns out recursion was not the problem.
On WSL2 Ubuntu 22.04 it runs fine, but I can't generate flamegraph
(Used cargo-flamegraph
and inferno
) from perf.data
due to IO/CPU overload and out of order events. I think there is some issue with how Windows allocates data or the Rust binary generated on Windows.
Posted the issue in rust repo: rust-lang/rust#123447
Turns out heapalloc was the culprit as mentioned in the issue I posted in Rust Lang repo.