rust-ndarray / ndarray

ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations

Home Page:https://docs.rs/ndarray/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The most effective way to transform each lane for generic dimensions.

Wleter opened this issue · comments

commented

Hi, appreciate this crate it's really well thought. I am using it in scientific project where performance is important, so I have a question
whether I implemented transformations of each lane in a generic ndarray the best way possible or not.

I am wondering because when I looked at flamegraph of my calculations and almost 70% of my time is spend on heap allocation and freeing, after adding those operations. I tried many changes but I didn't succeed.

The first transformation is like a generalized matrix multiplying (I am also wondering whether there is a way to multiply not 1d lane but 2d "multi lane" if it is possible)

fn matrix_transform(&mut self, array: &mut Array<Complex64, N>) {
    array.lanes_mut(Axis(self.dimension_no))
        .into_iter()
        .par_bridge()
        .for_each(|mut lane| lane.assign(&self.transformation.dot(&lane)));
}

and the second transformation transform each lane using fft

fn fft_transform(&mut self, array: &mut Array<Complex64, N>) {
    let dimension_size_sqrt = (self.dimension_size as f64).sqrt();

    array.lanes_mut(Axis(self.dimension_no))
        .into_iter()
        .par_bridge()
        .for_each(|mut lane| {
            let mut temp = lane.to_vec();
            self.fft.process(&mut temp);

            lane.iter_mut().zip(temp.iter()).for_each(|(dest, src)| {
                *dest = *src / dimension_size_sqrt;
            });
        });
}

Thanks in advance!

commented

I haven't looked into the details, but I'd recommend first trying:

  1. Don't use iterators and std::iter::zip
  2. Use ndarray::Zip and its parallelization support
  3. Don't use par_bridge, it is slow, unless you have to

It looks like you need a function like general_mat_vec_mul which ndarray has right there, but maybe I read the code wrong.