Stack Overflows on Zip indexed par_for_each
marstaik opened this issue · comments
Hi, I am running a bunch of these various various tasks in a rayon
threadpool. At some point a stack overflow in a thread becomes inevitable.
pub fn make_heightmap_test<T: Density + Send>(
grid: &mut ArrayViewMut3<T>,
) {
Zip::indexed(grid).par_for_each(|coords, value| {
// stuff ...
});
}
The last readable position I can get is deep in rayon, but in ndarray
code
zip/mod.rs
/// Return an *approximation* to the max stride axis; if
/// component arrays disagree, there may be no choice better than the
/// others.
fn max_stride_axis(&self) -> Axis {
let i = if self.prefer_f() {
self
.dimension
.slice()
.iter()
.rposition(|&len| len > 1)
.unwrap_or(self.dimension.ndim() - 1)
} else {
/* corder or default */
self
.dimension
.slice()
.iter()
.position(|&len| len > 1)
.unwrap_or(0)
};
Axis(i)
}
}
Attached is a call stack.
I can reproduce this quite consistently, but I am unsure of how to provide a dump in windows for rust.
Please let me know if I can provide more data.
Work stealing can generally lead to large stack depths. Is this for release builds (which use significantly less stack due to inlining and better space reuse)? Did you try to just increase the stack size of the worker threads using ThreadPoolBuilder::stack_size
? This is necessary often enough just due to how Rayon's scheduler works.
This was indeed on a release build. Setting the stack size to 24 megabytes for fun seems to have fixed it. Is there documentation on what the default stack size is? Or is it machine dependent? I haven't been able to find it just searching around.
AFAIK it is OS-dependant and I think on contemporary Linux, it is 8 MB for the main thread, c.f. https://unix.stackexchange.com/questions/127602/default-stack-size-for-pthreads
The main issue is that for the threads making up Rayon's thread pool, it is the much smaller default of 2 MB (which is controlled by Rust's std::thread
module, c.f. https://doc.rust-lang.org/stable/std/thread/index.html#stack-size, i.e. it is much easier to hit this with Rayon than without it (both due to work stealing increasing usage and a smaller limit to start with).