Slow performance of mdgrappa

Question

Slow performance of mdgrappa

lukasfolle opened this issue 3 years ago · comments

Hi,

Thank you for making this awesome project! Its has exactly the parts I missed for my project.
Unfortunately, I noticed mdgrappa to be quite slow even on a powerful pc.

Would it be possible to accelerate this function by making use of mkl or openblas?

Nicholas McKibben · Answer 1 · Tue Dec 14 2021 05:46:50 GMT+0800 (China Standard Time)

@lukasfolle you're welcome - glad you are finding it helpful!

Would it be possible to accelerate this function by making use of mkl or openblas?

It may be -- depends on where the bottleneck is for your particular reconstruction. Will link to #80, I just haven't had a lot of time recently to devote here. What is the size the of the dataset you want to reconstruct? If you happen to know the nominal parameters you'd like to use, those would also be helpful in benchmarking and finding bottlenecks

Lukas Folle · Answer 2 · Tue Dec 14 2021 16:20:22 GMT+0800 (China Standard Time)

Sure, I am using mdgrappa for a k-space of size 160x92x8 with a autocalibration signal of size 40x16x8 both with 15 channels and 5x5x5 kernel size.
Since I am doing this for multiple hundred k-spaces, this takes a considerable time for me.
As a CPU I am using a AMD Ryzen 7 3700X 8-Core Processor.
This takes about 12 seconds for me.

Nicholas McKibben · Answer 3 · Wed Dec 15 2021 10:54:31 GMT+0800 (China Standard Time)

3D volumes will be more challenging. I assume all the reconstructions must be done in isolation (i.e., no way to do a joint estimate/recon between all of them). The calculations are mostly passed through to numpy's BLAS/LAPACK wrappers which are of course single threaded and will take advantage of only 1 of your 8 cores. There is no doubt performance on the table for the existing mdgrappa implementation, but I wonder if there's a suitable improvement for now if you use multiprocessing or joblib to run recons in parallel?

Nicholas McKibben · Answer 4 · Wed Dec 15 2021 10:56:25 GMT+0800 (China Standard Time)

FYI: wheels for Windows/Mac/Linux are available from PyPI which may make installation/upgrade a little easier

Lukas Folle · Answer 5 · Wed Dec 15 2021 16:35:42 GMT+0800 (China Standard Time)

Yes, multiprocessing would be an option, thanks!
Can you confirm, that it takes approximately the same time for you? I want to rule out that I have numpy acceleration through some backend set up incorrectly.

Nicholas McKibben · Answer 6 · Sat Dec 18 2021 05:41:41 GMT+0800 (China Standard Time)

Yes, multiprocessing would be an option, thanks! Can you confirm, that it takes approximately the same time for you? I want to rule out that I have numpy acceleration through some backend set up incorrectly.

That time doesn't raise any alarm bells for me -- it is a 3D volume relying on a mostly Python implementation with heavy use of generalized array slicing. It makes application of GRAPPA to n-dimensional data sets trivial, but this may be more costly than I originally thought.

I modified the basic_mdgrappa example to use a synthetic 160x92x8x15 example with the autocalibration region and kernel size you specified with Rx=Ry=2 undersampling. On my machine (Ryzen 7 5800x 8-core) I do even worse:

INFO:find_acs.py:Took 4.02927e-05 sec to find hyper-cube
INFO:find_acs.py:Took 0.000173092 sec to find hyper-rect
INFO:root:Took 0.193907 seconds to find geometries and holes
INFO:root:Took 26.2915 seconds to train weights
INFO:root:Took 1.97092 seconds to apply weights
Took 28.500443935394287 sec

The "train weights" step is predictably the offender here.

Lukas Folle · Answer 7 · Mon Dec 20 2021 17:23:40 GMT+0800 (China Standard Time)

Alright, thanks for the confirmation!