The most effective way to transform each lane for generic dimensions.
Wleter opened this issue · comments
Hi, appreciate this crate it's really well thought. I am using it in scientific project where performance is important, so I have a question
whether I implemented transformations of each lane in a generic ndarray the best way possible or not.
I am wondering because when I looked at flamegraph of my calculations and almost 70% of my time is spend on heap allocation and freeing, after adding those operations. I tried many changes but I didn't succeed.
The first transformation is like a generalized matrix multiplying (I am also wondering whether there is a way to multiply not 1d lane but 2d "multi lane" if it is possible)
fn matrix_transform(&mut self, array: &mut Array<Complex64, N>) {
array.lanes_mut(Axis(self.dimension_no))
.into_iter()
.par_bridge()
.for_each(|mut lane| lane.assign(&self.transformation.dot(&lane)));
}
and the second transformation transform each lane using fft
fn fft_transform(&mut self, array: &mut Array<Complex64, N>) {
let dimension_size_sqrt = (self.dimension_size as f64).sqrt();
array.lanes_mut(Axis(self.dimension_no))
.into_iter()
.par_bridge()
.for_each(|mut lane| {
let mut temp = lane.to_vec();
self.fft.process(&mut temp);
lane.iter_mut().zip(temp.iter()).for_each(|(dest, src)| {
*dest = *src / dimension_size_sqrt;
});
});
}
Thanks in advance!
I haven't looked into the details, but I'd recommend first trying:
- Don't use iterators and std::iter::zip
- Use ndarray::Zip and its parallelization support
- Don't use par_bridge, it is slow, unless you have to
It looks like you need a function like general_mat_vec_mul which ndarray has right there, but maybe I read the code wrong.