The most effective way to transform each lane for generic dimensions.

Question

The most effective way to transform each lane for generic dimensions.

Wleter opened this issue a year ago · comments

Hi, appreciate this crate it's really well thought. I am using it in scientific project where performance is important, so I have a question
whether I implemented transformations of each lane in a generic ndarray the best way possible or not.

I am wondering because when I looked at flamegraph of my calculations and almost 70% of my time is spend on heap allocation and freeing, after adding those operations. I tried many changes but I didn't succeed.

The first transformation is like a generalized matrix multiplying (I am also wondering whether there is a way to multiply not 1d lane but 2d "multi lane" if it is possible)

fn matrix_transform(&mut self, array: &mut Array<Complex64, N>) {
    array.lanes_mut(Axis(self.dimension_no))
        .into_iter()
        .par_bridge()
        .for_each(|mut lane| lane.assign(&self.transformation.dot(&lane)));
}

and the second transformation transform each lane using fft

fn fft_transform(&mut self, array: &mut Array<Complex64, N>) {
    let dimension_size_sqrt = (self.dimension_size as f64).sqrt();

    array.lanes_mut(Axis(self.dimension_no))
        .into_iter()
        .par_bridge()
        .for_each(|mut lane| {
            let mut temp = lane.to_vec();
            self.fft.process(&mut temp);

            lane.iter_mut().zip(temp.iter()).for_each(|(dest, src)| {
                *dest = *src / dimension_size_sqrt;
            });
        });
}

Thanks in advance!

bluss · Answer 1 · Sat Jul 29 2023 20:46:35 GMT+0800 (China Standard Time)

I haven't looked into the details, but I'd recommend first trying:

Don't use iterators and std::iter::zip
Use ndarray::Zip and its parallelization support
Don't use par_bridge, it is slow, unless you have to

It looks like you need a function like general_mat_vec_mul which ndarray has right there, but maybe I read the code wrong.