Memory Allocation Fault in LU factorization

Question

Memory Allocation Fault in LU factorization

ivannz opened this issue 7 years ago · comments

This issue was encountered here in Scikit. In short, when trying to perform linear dimensionality reduction for data analysis purposes, the script fails abruptly with a memory allocation error.

After some debugging, I managed to pin down the possible source of the problem to the LU factorization on line line 258 in utils/extmath.py. The fault occurs inside the decomposition in linalg.lu (specifically in line 185 of scipy/linalg/decomp_lu.py): the execution flow passes through the f2py wrapper of dlu_c and fails somewhere inside dgetrf (line 27).

Below is a simple reproducing snippet (on OS X 10.10.5), although LU-decomposing a matrix such as this makes little sense:

import numpy as np

from scipy import linalg

# changing 46336 to 46335 avoids the fault
obj = np.ones((46336, 110), dtype=np.float)
Q, _ = linalg.lu(obj, permute_l=True)

In python 3.6.0 (and in 2.7.13) it fails with

python(723,0x7fff77404300) malloc: *** mach_vm_map(size=18446744065123717120) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

The value 18446744065123717120 is FFFF FFFE 003E 9000 in hex looks like a signed extension of an unsigned int.

I used the following setup: python 3.6.0 (default, Dec 24 2016, 08:02:28) (in virtualenv), default numpy ('1.12.0') from pip. I tested both the latest build from git and the pip version ('0.18.1') of scipy.
System information:

System Version: OS X 10.10.5 (14F2109)
Kernel Version: Darwin 14.5.0

No manual linkage to specific linear algebra libraries was done, since according to the reference numpy/scipy automatically links with the Apple's Accelerate/vecLib linear algebra libraries.

Judging from the crash report form python 2.7.13 (crash_report.txt), something odd is going on inside Apple's Accelerate.

Python 2.7.13 (default, Dec 18 2016, 07:03:34) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin

cc: @lesteve

Ilhan Polat · Answer 1 · Mon Mar 06 2017 20:45:56 GMT+0800 (China Standard Time)

Assuming the data is real valued, could you please pass the same data directly to linalg.lapack.dgetrf?

Ivan Nazarov · Answer 2 · Mon Mar 06 2017 21:12:39 GMT+0800 (China Standard Time)

Yes, np.ones(...) creates a matrix of real 1.0 values (double). This example fails with exactly the same fault:

import numpy as np

from scipy import linalg

# changing 46336 to 46335 avoids the fault
obj = np.ones((46336, 110), dtype=np.float64, order="F")
lu_, piv_, info_ = linalg.lapack.dgetrf(obj)

Changing np.float64 to np.float32 does not solve the issue.

The last 3 calls before the fault in the stack traces in the new crash report are identical to the ones in the previous crash report.

Loïc Estève · Answer 3 · Mon Mar 06 2017 21:37:14 GMT+0800 (China Standard Time)

In case it helps, from the scikit-learn issue the problem was not happening within a conda environment using MKL, so it may be an OpenBLAS problem on OSX.

Ilhan Polat · Answer 4 · Mon Mar 06 2017 21:38:58 GMT+0800 (China Standard Time)

Indeed, on my machine I have no issues.

Loïc Estève · Answer 5 · Mon Mar 06 2017 21:52:19 GMT+0800 (China Standard Time)

Yeah this is a OSX-specific problem than only happens when using OpenBLAS. I have to say I am not sure how to further debug the problem and I am hoping someone here will have some suggestions.

Ivan Nazarov · Answer 6 · Mon Mar 06 2017 22:12:18 GMT+0800 (China Standard Time)

I am sure this has everything to do with Apple's accelerate, but I am unable to go any further. The following C++ code behaves identically to the python snippets above.

#include <math.h>

// include the clapack headers
#include <Accelerate/Accelerate.h>

int main (int argc, char const *argv[])
{

    int num_rows=46336, num_cols=110;

    // create data array for our matrix. Will be a vector that is num_rows*num_cols elements
    // to access A(r,c) use A[r+c*num_rows], in otherwords major column ordering. {r1c1,r2c1,r3c1...}
    double *A = new double[num_rows*num_cols];
    for(int row=0; row<num_rows; row++) {
        for(int col=0; col<num_cols; col++) {
            A[row+col*num_rows] = 1.0;
        }
    }

    __CLPK_integer m, n, lda, info;
    m=lda=num_rows;
    n=num_cols;

    __CLPK_integer * ipiv = new __CLPK_integer[(num_rows>num_cols?num_cols:num_rows)];

    dgetrf_(&m, &n, A, &lda, ipiv, &info);

    return 0;
/* g++ file.cpp -o file -framework accelerate */
}

This should be compiled with g++ file.cpp -o file -framework accelerate on a recent Mac. Changing 46336 to 46335 again removes the fault.
dgetrf on Netlib LAPACK

Pauli Virtanen · Answer 7 · Mon Mar 06 2017 22:18:58 GMT+0800 (China Standard Time)

Then it's likely an integer overflow inside Accelerate. Only Apple can fix bugs in Accelerate, so it should be reported to them, and this issue can be closed.

Pauli Virtanen · Answer 8 · Mon Mar 06 2017 22:22:36 GMT+0800 (China Standard Time)

Closing (looks like accelerate bug). The discussion can be continued, but I guess the simplest fix is to not use accelerate.

Loïc Estève · Answer 9 · Mon Mar 06 2017 23:20:24 GMT+0800 (China Standard Time)

I guess the simplest fix is to not use accelerate.

I was able to reproduce the original scikit-learn issue using OpenBLAS within an Anaconda environment. Was I somehow still using accelerate?

Pauli Virtanen · Answer 10 · Tue Mar 07 2017 00:17:03 GMT+0800 (China Standard Time)

Sorry, I understood your comment above as that it did not occur with non-accelerate environment.
You can see where the problem is by running it under debugger and seeing whether the backtrace points to openblas or accelerate.

Nevertheless, you should try a similar minimal program as above. If it fails, then it's openblas/lapack issue.

Loïc Estève · Answer 11 · Tue Mar 07 2017 01:50:15 GMT+0800 (China Standard Time)

I assumed that the scipy wheels available on PyPI were built against OpenBLAS as is the case on Linux. I was wrong, on OSX the wheels are built against Accelerate, at least if I believe the output of otool -L on scipy/linalg/_fblas*.so. Pinging @matthew-brett just to confirm this.

I could run the Python snippet with OpenBLAS using conda-forge without error.

All in all it seems like an Accelerate-specific problem.

Matthew Brett · Answer 12 · Tue Mar 07 2017 01:56:43 GMT+0800 (China Standard Time)

Yes, the OSX wheels are build with Accelerate. What version of OSX are you running?

Loïc Estève · Answer 13 · Tue Mar 07 2017 06:33:45 GMT+0800 (China Standard Time)

Yes, the OSX wheels are build with Accelerate

Thanks for confirming this, which I was not aware of.

@ivannz has opened a bug with Apple and will let us know when he gets an answer.

For completeness:
The version of OSX I am using: OSX 10.12.1, Darwin 16.1.0.

@ivannz's version is different though, so it looks like it is not restricted to only one OSX version:

System Version: OS X 10.10.5 (14F2109)
Kernel Version: Darwin 14.5.0

Matthew Brett · Answer 14 · Tue Mar 07 2017 16:19:22 GMT+0800 (China Standard Time)

Same output on

System Version: OS X 10.9.5 (13F1911)
Kernel Version: Darwin 13.4.0

and

      System Version: OS X 10.11.5 (15F34)
      Kernel Version: Darwin 15.5.0

and

      System Version: OS X 10.11.6 (15G1217)
      Kernel Version: Darwin 15.6.0

but no such error on:

      System Version: Mac OS X 10.6.8 (10K549)
      Kernel Version: Darwin 10.8.0

@ivannz - would you mind maintaining a copy of your report and any replies from Apple over at http://www.openradar.me ?

Ivan Nazarov · Answer 15 · Tue Mar 07 2017 18:32:45 GMT+0800 (China Standard Time)

Certainly! Here is the link to Open Radar rdar://30868533.

Clemens Brunner · Answer 16 · Wed May 24 2017 15:30:35 GMT+0800 (China Standard Time)

Is it possible to build the macOS wheels without the Accelerate framework? I mean, are there any alternative libraries pre-installed such as OpenBLAS? Or are we really stuck with Apple's ancient and buggy numerical libraries (with no chance for a quick fix)?

Ilhan Polat · Answer 17 · Wed May 24 2017 16:08:57 GMT+0800 (China Standard Time)

I'm hopeful that once the discussion in #6051 is finalized, we will be back on track with the freedom of choosing BLAS or LAPACK flavor. But it seems that we are stuck until v1.0 with Accelerate

Clemens Brunner · Answer 18 · Thu Jul 20 2017 15:55:07 GMT+0800 (China Standard Time)

Has anyone tested if the issue still exists in the latest macOS High Sierra beta? I'm hoping that Apple has noticed the Accelerate bug and fixed it silently...

Ralf Gommers · Answer 19 · Fri Jul 21 2017 11:09:18 GMT+0800 (China Standard Time)

Seems this was closed incorrectly, so reopening.

The two examples don't crash for me on OS X 10.12 with scipy master built against Accelerate. Not sure if that's a change in scipy or in Accelerate.

Warren Weckesser · Answer 20 · Fri Jul 21 2017 13:20:37 GMT+0800 (China Standard Time)

I just built numpy and scipy from their master branches, and the code given in the description above crashes on my late 2013 Macbook running OS X 10.9.5, with Python 3.5.2:

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.14.0.dev0+9afb77a'

In [3]: import scipy

In [4]: scipy.__version__
Out[4]: '1.0.0.dev0+85287f9'

In [5]: from scipy import linalg

In [6]: obj = np.ones((46336, 110), dtype=np.float)

In [7]: Q, _ = linalg.lu(obj, permute_l=True)
python(15120,0x7fff74a9d310) malloc: *** mach_vm_map(size=18446744065123717120) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Segmentation fault: 11

If I use numpy from Anaconda (numpy 1.12.1), I don't get the crash.

Update: Now that I've done more than just skim the full thread, I guess this comment is old news.

Clemens Brunner · Answer 21 · Fri Jul 21 2017 15:51:56 GMT+0800 (China Standard Time)

@rgommers the C++ example as well as all Python examples (using scipy master) still crash on macOS 10.12.6 (latest version released on July 19). If anyone has the preview of macOS 10.13 (High Sierra), it would be great to hear if the problem has been fixed by Apple.

In any case, I suggest to add a test to scipy to test against this issue - WDYT?

Ilhan Polat · Answer 22 · Fri Jul 21 2017 16:28:18 GMT+0800 (China Standard Time)

@cbrnr Could you also please add the result of scipy.linalg.lapack.ilaver()

Pauli Virtanen · Answer 23 · Fri Jul 21 2017 16:30:38 GMT+0800 (China Standard Time)

It looks to me the diagnosis of the issue was correct, and was reproduced with a standalone C program --- there clearly is a bug in Accelerate, and the only way Scipy can fix it is by not using Accelerate.

Clemens Brunner · Answer 24 · Fri Jul 21 2017 16:31:53 GMT+0800 (China Standard Time)

@pv yes - unless Apple fixes the bug, which might have already happened in 10.13 Preview...

Ilhan Polat · Answer 25 · Fri Jul 21 2017 16:32:21 GMT+0800 (China Standard Time)

and the only way Scipy can fix it is by not using Accelerate.

@pv Ahem... #6051 anyone ? :-D

Clemens Brunner · Answer 26 · Fri Jul 21 2017 16:35:56 GMT+0800 (China Standard Time)

>>> scipy.linalg.lapack.ilaver()
(3, 2, 1)

Pauli Virtanen · Answer 27 · Fri Jul 21 2017 16:37:04 GMT+0800 (China Standard Time)

If we add a test, we should add a check in setup.py that raises an error if trying to build against Accelerate. Otherwise, you'll just get a segfaulting test suite. . I would just suggest dropping Accelerate. There's no point in spending resources of an open source project in trying to debug software provided by one of the world's biggest and most profitable companies with hundreds of thousands of employees, who seem not even willing to communicate publicly about defects in their software products.

Clemens Brunner · Answer 28 · Fri Jul 21 2017 16:40:15 GMT+0800 (China Standard Time)

Yep, adding this test basically means that scipy is broken on macOS - I don't know how to best handle this. I guess such questions had better be discussed in #6051...

Ilhan Polat · Answer 29 · Fri Jul 21 2017 16:45:00 GMT+0800 (China Standard Time)

I think #6051 is saturated and does not reflect the intent the discussion happened towards the end. I don't have any experience with Accelerate to construct meaningful sentences.

@cbrnr Would you be interested in making separate issue for this ? (By the way, ouch for the LAPACK version)

I can also try to summarize that thread if nobody wants to take it on

Ralf Gommers · Answer 30 · Fri Jul 21 2017 16:51:58 GMT+0800 (China Standard Time)

@ilayn that would be great. What I would suggest is not to do that in a comment but in a short markdown/rest doc (put it in a gist maybe or on the wiki). That way others can contribute and we can get something with the pros/cons for each option properly summarized.

Clemens Brunner · Answer 31 · Fri Jul 21 2017 16:54:32 GMT+0800 (China Standard Time)

@ilayn it would be great if you get this doc started - I'd be happy to contribute, but I don't have time to start this off at the moment.

Ilhan Polat · Answer 32 · Fri Jul 21 2017 16:55:50 GMT+0800 (China Standard Time)

@rgommers @cbrnr Let me try.

Ilhan Polat · Answer 33 · Fri Jul 21 2017 18:22:36 GMT+0800 (China Standard Time)

@rgommers Is this gist something that you can work with? I hope I understood correctly.

Ralf Gommers · Answer 34 · Fri Jul 21 2017 18:52:50 GMT+0800 (China Standard Time)

Thanks @ilayn, that's a good start. Motivation is good as is, the second half is a bit incomplete. I would suggest this structure below the Motivation section, working it out in a bit more detail):

Options
--------

Keep supporting Accelerate as we do now
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*Pros*: least effort, building/installing SciPy on OS X is easy.

*Cons*: cannot make use of recently introduced functionality in LAPACK (e.g. gh-xxxx, ) or fix bugs that are due to Accelerate (e.g. gh-7131)

Drop Accelerate support completely
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*Pros*: can move to LAPACK 3.y.z, can remove Accelerate patching in ``scipy._build_utils``

*Cons*: some people want to use Accelerate for performance reasons (link), we force users to install another BLAS/LAPACK which is not easy right now. This could be addressed by packaging OpenBLAS as a wheel, but that's a significant amount of work too.

In particular, these are the questions to address (copy from https://github.com/scipy/scipy/pull/6051#issuecomment-290933088)

Keep using Accelerate BLAS but shadow LAPACK
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
this one is a bit harder to summarize, need to read through gh-6051. IIRC can be done in a couple of ways
- Using ``vecLibFort``, see https://github.com/scipy/scipy/pull/6051#issuecomment-225431271
- Just bundling missing + broken routines: https://github.com/scipy/scipy/pull/6051#issuecomment-225431600
- <I think there's more flavors>

Other considerations
---------------------

Which minimum LAPACK version to support
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See https://github.com/scipy/scipy/pull/6051#issuecomment-265656371

ABI issues with shadowing
~~~~~~~~~~~~~~~~~~~~~
MKL also uses a g77 ABI, see https://github.com/scipy/scipy/pull/6051#issuecomment-225431914 

Rejected alternatives
---------------------
Dropping all of LAPACK into SciPy (see https://github.com/scipy/scipy/pull/6051#issuecomment-266380190).

Ralf Gommers · Answer 35 · Fri Jul 21 2017 18:55:27 GMT+0800 (China Standard Time)

Sorry I'm boarding a plane in an hour and will probably not have time for more than drive-by commenting in the next few weeks. But I think if you can work things out as far as possible along these lines, then some of the people on gh-6051 will be happy to fill in the remaining blanks.

Ilhan Polat · Answer 36 · Fri Jul 21 2017 19:12:42 GMT+0800 (China Standard Time)

While I would gladly maintain this doc somewhere, I'm really not into this topic at the required level. So I will leave it in an issue and maintainers with edit priviledges can fix/modify further.

Clemens Brunner · Answer 37 · Fri Jul 21 2017 19:16:16 GMT+0800 (China Standard Time)

It would be great if everyone could just add comments and edits instead of you having to edit the document all the time. Maybe a separate issue is a good starting point for that. I'd just highlight that scipy is currently broken on macOS due to it depending on Accelerate by default. Other open-source alternatives (OpenBLAS) should be highlighted more because these issues just can't happen. MKL would be another alternative, but since it is also proprietary the same issues might occur (and probably already have occurred at some point).

Ralf Gommers · Answer 38 · Fri Jul 21 2017 19:17:07 GMT+0800 (China Standard Time)

Suggest the wiki (https://github.com/scipy/scipy/wiki). You should be able to make a new page there - much easier for multiple people to edit.

Ilhan Polat · Answer 39 · Fri Jul 21 2017 19:18:38 GMT+0800 (China Standard Time)

Ah perfect. Wiki it is.

Ilhan Polat · Answer 40 · Fri Jul 21 2017 19:48:22 GMT+0800 (China Standard Time)

Initial Wiki page is created at https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate

Sturla Molden · Answer 41 · Fri Jul 21 2017 20:15:14 GMT+0800 (China Standard Time)

Sad as it is, it is time to drop Accelerate. Integer overflow is even present in fundamental BLAS functions like *GEMM. This means that even the proposed solutions will never work properly. Even if we build LAPACK on top of Accelerate's CBLAS it will still be unstable. There is no chance of Apple ever updating Accelerate or fixing the known bugs. Dropping Accelerate support is the only sane solution, even for NumPy. We need to focus on supporting MKL, OpenBLAS, and ATLAS.

Ilhan Polat · Answer 42 · Fri Jul 21 2017 20:43:17 GMT+0800 (China Standard Time)

@sturlamolden I've added your example to the wiki document. That's indeed unfortunate. Should I also create an issue on the NumPy side too?

Sturla Molden · Answer 43 · Fri Jul 21 2017 21:14:40 GMT+0800 (China Standard Time)

Here is an example of NumPy failing due to integer overflow in Accelerate *GEMM: numpy/numpy#5533

Sturla Molden · Answer 44 · Fri Jul 21 2017 21:20:18 GMT+0800 (China Standard Time)

Also note that ATLAS has this problem, it is not Accelerate specific. It is due to the helper functions catlas_dset et al.

Sturla Molden · Answer 45 · Fri Jul 21 2017 21:24:51 GMT+0800 (China Standard Time)

Here is an example of scikit-learn running into this issue:
scikit-learn/scikit-learn#4197

Andrew Jaffe · Answer 46 · Mon Jul 24 2017 22:20:20 GMT+0800 (China Standard Time)

I am not completely sure that I've read the thread carefully enough, but just in case this is useful:

On macOS 10.12.6 (i.e., latest non-beta Sierra, as of 24 July 2017), with Python 2.7.13 or Python 3.6.2 framework builds from python.org and pip-installed numpy and scipy (which should therefore use Accelerate) the original python script in the message from @ivannz at the top of this thread runs without crashing.

Similarly, the g++ code works fine when compiled with Xcode g++ [really Apple LLVM version 8.1.0 (clang-802.0.42)].

I came to this following the thread suggesting dropping accelerate which seems like a terrible to me... I think you will lose a lot of users in the scientific community who need this functionality but don't have the time to deal with extra installations...

Ilhan Polat · Answer 47 · Mon Jul 24 2017 22:31:45 GMT+0800 (China Standard Time)

@defjaf Please have a look at the arguments in the wiki page --> https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate

There are many problems provided which is due to this framework and just look at the examples given by sturlamolden above your comment. It is not an installation problem. OSX support is not being dropped. It's the library that the build will link to, that changes. I don't know who we are or which users will be lost but all will benefit from this.

As being an ex-academic all the way, scientific community has all the time in the world to download <50 MBs and run two more lines of commands on the terminal to have a more stable environment.

Andrew Jaffe · Answer 48 · Mon Jul 24 2017 22:43:47 GMT+0800 (China Standard Time)

@ilayn, sorry, I misunderstood whether @sturlamolden was citing new incidents of the original issue cropping up rather than unrelated issues.

However, for what it's worth, numpy issue 5533 does not crash my machine with the setup as above (I did not read that thread further than the first message, so apologies if that is not the "real" problem.)

In any case, I do agree that we should have an overall working set of routines...

On the other hand, I can tell you from first-hand experience (both with my own machine and attempting to debug others' setups) that external dependences are very often a PITA to handle, and are rarely trivial, due to conflicts, upgrades, permissions, environment variables, etc. And, yes, scientists should be savvy enough to deal with this but often do not have the requisite mental bandwidth available after the first failure... So if we do abandon Accelerate, please do make the outcome as frictionless as possible...

(Also, apologies that this isn't the right thread for some of these comments...)

Ilhan Polat · Answer 49 · Mon Jul 24 2017 22:47:30 GMT+0800 (China Standard Time)

I'm building scipy on Windows machines or rather failing at it continuously I should say 😃 So, please trust me when I say that I share your concerns wholeheartedly.

Sturla Molden · Answer 50 · Mon Jul 24 2017 23:26:59 GMT+0800 (China Standard Time)

For what it is worth, the LU code in question does not fail on my system either (macOS 10.12.6, Python 3.6.1, scipy 0.18.1). This illustrates another issue with Accelerate, that fixes to the framework are distributed as OS upgrades, and not backported to previous OS versions.

However, it is still the case that the LAPACK version is old because the interface is frozen on CLAPACK which is abandonware.

I do not have enough RAM to check if the *GEMM integer overflow due to code from ATLAS is still present, but is has a simple fix: raise an exception is arrays have more than 2147483648 and we are using Accerate or ATLAS.

I do not advocate replacing Accelerate with ATLAS for the macOS wheels, though. ATLAS is equally buggy as Accelerate, though less performant. It will have to be OpenBLAS. Or, we could accept the quirks of Accelerate but build a recent LAPACK on top of it.

Clemens Brunner · Answer 51 · Tue Jul 25 2017 03:32:36 GMT+0800 (China Standard Time)

@defjaf @sturlamolden I have macOS 10.12.6 and the C++ program (compiled with Apple LLVM version 8.1.0 (clang-802.0.42)) does not work (i.e. it crashes with a segfault). Same for the Python examples.

Sturla Molden · Answer 52 · Tue Jul 25 2017 05:07:45 GMT+0800 (China Standard Time)

Oh, this is embarassing. It seems my default SciPy was built with MKL... That is why it didn't segfault...

Rebuilding with clang and Accelerate and it does indeed fail.

(Building with GCC does not even work, because of errors in veclib.h. I guess that must have changed after the upgrade to macOS Sierra...)

Clemens Brunner · Answer 53 · Tue Jul 25 2017 14:36:00 GMT+0800 (China Standard Time)

Incidentally, openblas is now available in core Homebrew.

Andrew Jaffe · Answer 54 · Tue Jul 25 2017 16:40:14 GMT+0800 (China Standard Time)

OK, I'm confused. My installation definitely doesn't crash -- I use the wheels that get downloaded and installed with pip install numpy scipy (either pip2 or pip3) -- I know that there is a way to check how numpy and scipy are compiled but I don't remember the incantation...

Clemens Brunner · Answer 55 · Tue Jul 25 2017 17:35:13 GMT+0800 (China Standard Time)

Since the underlying issue is with Accelerate, the C example by @ivannz should also crash - it is strange that this is working for you. Could you double-check?

Andrew Jaffe · Answer 56 · Tue Jul 25 2017 18:03:51 GMT+0800 (China Standard Time)

@cbrnr Yes, it absolutely runs on my machine, which has the latest macOS Sierra 10.12.6 and Xcode (as I think @sturlamolden points out, the issue may be that Apple distributes fixes through OS updates). I have tried g++, clang++ and even some homebrew llvm versions.

Clemens Brunner · Answer 57 · Tue Jul 25 2017 18:07:59 GMT+0800 (China Standard Time)

But does Apple distribute fixes only to certain users/machines? I'm on the same macOS and Xcode versions...

Clemens Brunner · Answer 58 · Tue Jul 25 2017 18:09:30 GMT+0800 (China Standard Time)

Just to rule one thing out, do you have Anaconda installed? If so, please remove it (or at least everything on the path). You can never be sure what they include in their distribution (I wouldn't be surprised if they included compilers).

Andrew Jaffe · Answer 59 · Tue Jul 25 2017 18:10:27 GMT+0800 (China Standard Time)

No, this is the latest Python.org framework installs of 2.7.* and 3.6.* with pip installation of packages. I suppose it is possible that these things could vary on different machines, but I have tried on both a late 2014 Retina 5k iMac and a late 2013 Retina MacBook Pro.

Clemens Brunner · Answer 60 · Tue Jul 25 2017 18:13:02 GMT+0800 (China Standard Time)

which g++?

Andrew Jaffe · Answer 61 · Tue Jul 25 2017 18:15:18 GMT+0800 (China Standard Time)

$ which g++
g++ is /usr/bin/g++

$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer//usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

and just to prove that I'm actually running the code (nb I added some cout << ... to check):

$ g++ ./accelcrash.cpp -o accelcrash -framework accelerate
$ ./accelcrash 
about to do dgetrf_
done
$

Clemens Brunner · Answer 62 · Tue Jul 25 2017 18:51:58 GMT+0800 (China Standard Time)

Same here (only without the strange double slash in the prefix argument). Maybe for whatever reason this particular number 46336 does not cause an overflow on your machine, but some other number does. It is pretty unlikely that you have a different version of Accelerate if we're on the same OS and Xcode versions.

The whole back and forth only goes to show that Accelerate (and ATLAS) seem to have a lot of bugs that are difficult to reproduce (e.g. NumPy 5533 works for me) and/or won't be fixed in a while, so choosing OpenBLAS or MKL as primary options seems to be a good idea. And maybe I should just install Linux on my MacBook...

Sturla Molden · Answer 63 · Tue Jul 25 2017 20:58:24 GMT+0800 (China Standard Time)

@defjaf That is not really g++ but actually a symlink to clang++. Here is roughly what you should see with a real g++:

sturla$ which g++
/usr/local/gfortran/bin/g++

sturla$ g++ --version
g++ (GCC) 6.3.0
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clemens Brunner · Answer 64 · Tue Jul 25 2017 21:00:14 GMT+0800 (China Standard Time)

@sturlamolden Apple doesn't ship gcc - all gcc binaries point to clang alternatives.

Sturla Molden · Answer 65 · Tue Jul 25 2017 21:02:18 GMT+0800 (China Standard Time)

@cbrnr Yes, that is what I meant. GCC must be installed separately.

Clemens Brunner · Answer 66 · Tue Jul 25 2017 21:04:30 GMT+0800 (China Standard Time)

OK, but on my machine the example crashes even when using clang. How is the compiler choice relevant to this discussion (sorry, the thread is getting longer and longer)?

Sturla Molden · Answer 67 · Tue Jul 25 2017 21:08:18 GMT+0800 (China Standard Time)

sturla$ clang++ file.cpp -o file -framework accelerate
sturla$ ./file
file(23769,0x7ffff128e3c0) malloc: *** mach_vm_map(size=18446744065123717120) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Segmentation fault: 11

Changing number of rows to 46335:

sturla$ clang++ file.cpp -o file -framework accelerate
sturla$ ./file
sturla$ _

Compiler version:

sturla$ clang++ --version
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Clemens Brunner · Answer 68 · Tue Jul 25 2017 21:15:13 GMT+0800 (China Standard Time)

Yes, I thought this was the point - it crashes with the default compiler (but not on @defjaf's setup, which also uses clang). What about gcc in this context? It doesn't work with accelerate and spits out compilation errors.

Clemens Brunner · Answer 69 · Tue Jul 25 2017 21:16:24 GMT+0800 (China Standard Time)

Edit: OK, g++ requires -flax-vector-conversions. No segfault.

Clemens Brunner · Answer 70 · Tue Jul 25 2017 21:20:12 GMT+0800 (China Standard Time)

What did I just do? Now the C example works with both clang++ and g++... 😱 Did I install any library update with Homebrew? Could this be not related to Accelerate after all?

Andrew Jaffe · Answer 71 · Tue Jul 25 2017 21:45:40 GMT+0800 (China Standard Time)

@cbrnr what about the python version?

Sturla Molden · Answer 72 · Tue Jul 25 2017 21:50:31 GMT+0800 (China Standard Time)

sturla$ clang++ -o file file.cpp -flax-vector-conversions -framework accelerate
sturla$ ./file
file(24274,0x7ffff128e3c0) malloc: *** mach_vm_map(size=18446744065123717120) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Segmentation fault: 11

Clemens Brunner · Answer 73 · Tue Jul 25 2017 21:54:32 GMT+0800 (China Standard Time)

@defjaf Python works too. 😕 I have no idea what's going on...
@sturlamolden I meant -flax-vector-conversions is required to compile with g++.

Andrew Jaffe · Answer 74 · Tue Jul 25 2017 22:11:22 GMT+0800 (China Standard Time)

Perhaps this verbose compiler output might help find any differences in our setups? (Note that the interesting lines are the long ones starting with quote marks where the compilation and linking actually happens...)

$ clang++ -v ./accelcrash.cpp -o accelcrash -framework accelerate
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang" -cc1 -triple x86_64-apple-macosx10.12.0 -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name accelcrash.cpp -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu penryn -target-linker-version 278.4 -v -dwarf-column-info -debugger-tuning=lldb -resource-dir /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.1.0 -stdlib=libc++ -fdeprecated-macro -fdebug-compilation-dir /Users/jaffe/home/python -ferror-limit 19 -fmessage-length 80 -stack-protector 1 -fblocks -fobjc-runtime=macosx-10.12.0 -fencode-extended-block-signature -fcxx-exceptions -fexceptions -fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics -o /var/folders/nc/n7d98d0x46g0h5yyb4bc1gv40000gv/T/accelcrash-b0a9ed.o -x c++ ./accelcrash.cpp
clang -cc1 version 8.1.0 (clang-802.0.42) default target x86_64-apple-darwin16.7.0
ignoring nonexistent directory "/usr/include/c++/v1"
#include "..." search starts here:
#include <...> search starts here:
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1
 /usr/local/include
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.1.0/include
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include
 /usr/include
 /System/Library/Frameworks (framework directory)
 /Library/Frameworks (framework directory)
End of search list.
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld" -demangle -lto_library /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/libLTO.dylib -no_deduplicate -dynamic -arch x86_64 -macosx_version_min 10.12.0 -o accelcrash /var/folders/nc/n7d98d0x46g0h5yyb4bc1gv40000gv/T/accelcrash-b0a9ed.o -framework accelerate -lc++ -lSystem /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.1.0/lib/darwin/libclang_rt.osx.a

Clemens Brunner · Answer 75 · Tue Jul 25 2017 22:19:11 GMT+0800 (China Standard Time)

@defjaf my output looks identical. Note that now all of a sudden it works (just like on your machine) and I can't go back to when it didn't.

Andrew Jaffe · Answer 76 · Tue Jul 25 2017 22:20:47 GMT+0800 (China Standard Time)

Yep, this was more directed @sturlamolden: what about you, since you're still seeing the crash(es)?...

Ilhan Polat · Answer 77 · Tue Jul 25 2017 22:21:52 GMT+0800 (China Standard Time)

Apparently, you have accidentally Microsoft-ified your machines. That's my default daily headscratching position. Things work, based on planets' alignment and weather forecast.

Sturla Molden · Answer 78 · Tue Jul 25 2017 22:26:39 GMT+0800 (China Standard Time)

sturla$ $ clang++ -v ./file.cpp -o file -flax-vector-conversions -framework accelerate
-bash: $: command not found

Sturla Molden · Answer 79 · Tue Jul 25 2017 22:32:00 GMT+0800 (China Standard Time)

Oops...

sturla$ clang++ -v -o file file.cpp -flax-vector-conversions -framework accelerate
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang" -cc1 -triple x86_64-apple-macosx10.12.0 -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name file.cpp -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu penryn -target-linker-version 278.4 -v -dwarf-column-info -debugger-tuning=lldb -resource-dir /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.1.0 -stdlib=libc++ -fdeprecated-macro -fdebug-compilation-dir /Users/sturla/development/lutest -ferror-limit 19 -fmessage-length 80 -stack-protector 1 -fblocks -fobjc-runtime=macosx-10.12.0 -fencode-extended-block-signature -fcxx-exceptions -fexceptions -fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics -o /var/folders/yl/_1z15lyj0w95_0tpv51sh47m0000gn/T/file-b0d43d.o -x c++ file.cpp
clang -cc1 version 8.1.0 (clang-802.0.42) default target x86_64-apple-darwin16.7.0
ignoring nonexistent directory "/usr/include/c++/v1"
#include "..." search starts here:
#include <...> search starts here:
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1
 /usr/local/include
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.1.0/include
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include
 /usr/include
 /System/Library/Frameworks (framework directory)
 /Library/Frameworks (framework directory)
End of search list.
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld" -demangle -lto_library /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/libLTO.dylib -no_deduplicate -dynamic -arch x86_64 -macosx_version_min 10.12.0 -o file /var/folders/yl/_1z15lyj0w95_0tpv51sh47m0000gn/T/file-b0d43d.o -framework accelerate -lc++ -lSystem /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.1.0/lib/darwin/libclang_rt.osx.a

Clemens Brunner · Answer 80 · Tue Jul 25 2017 22:33:38 GMT+0800 (China Standard Time)

@ilayn 🤣 in reality, we should all be using Linux for scientific computing (and actually I am, but just not on my MacBook)...

Andrew Jaffe · Answer 81 · Tue Jul 25 2017 22:36:35 GMT+0800 (China Standard Time)

@sturlamolden No differences other than filenames...

Sturla Molden · Answer 82 · Tue Jul 25 2017 22:37:48 GMT+0800 (China Standard Time)

@ilayn It certainly feels that way after the upgrade to Sierra. Now my gfortran does not work because as does not understand the -m flag.

Clemens Brunner · Answer 83 · Tue Jul 25 2017 22:45:39 GMT+0800 (China Standard Time)

@sturlamolden plain -m? This flag does not exist on Linux either.

Anyway, this discussion is digressing too far from the main issue here. But it's clear that there are some major issues popping up here and there. I'm not sure this is helping in the decision how SciPy should proceed with official macOS builds.

Andrew Jaffe · Answer 84 · Tue Jul 25 2017 22:47:28 GMT+0800 (China Standard Time)

@sturlamolden @cbrnr I agree (but nb. gfortran from homebrew works fine on my machine... perhaps this points to something brittle in your setup?)

Ivan Nazarov · Answer 85 · Wed Aug 30 2017 17:07:30 GMT+0800 (China Standard Time)

I guess Apple hasn't fixed the issue. On the latest Sierra

      System Version: macOS 10.12.6 (16G29)
      Kernel Version: Darwin 16.7.0

I get the segfault for these new constants (just used the binary search from 46335 upward to find a new failure threshold):

    int num_rows=46344, num_cols=110;

Changing 46343 to 46344 or any higher value also causes a segfault. Changing 110 to a higher value, while keeping num_rows = 46343 does not cause a segfault.

I replicated this behaviour both with the standalone C code and the python example at the top of the thread.

PS: There has been no activity around my bug report to Apple so far.

Pauli Virtanen · Answer 86 · Wed Aug 30 2017 17:16:34 GMT+0800 (China Standard Time)

Closing, there's nothing we can do aside from dropping Accelerate.

Clemens Brunner · Answer 87 · Tue Sep 26 2017 23:49:03 GMT+0800 (China Standard Time)

I just upgraded to macOS High Sierra (10.13) and the bug seems to have been fixed by Apple. All examples above (both Python and C++) work! 🎉

Ivan Nazarov · Answer 88 · Wed Sep 27 2017 03:15:28 GMT+0800 (China Standard Time)

Today I got an email from the developers, asking to test if the bug persists in High Sierra 10.13 GM.

This is a follow-up regarding regarding Bug ID# 30868533.

Please verify this issue with the macOS 10.13 GM and update your bug report at https://bugreport.apple.com/ with your results.

macOS High Sierra 10.13 GM (17A365)
https://developer.apple.com/download/
Posted Date: Sep 25, 2017

If the issue persists, please attach a new sysdiagnose captured in the latest build and attach it to the bug report.

macOS sysdiagnose Instructions:
https://developer.apple.com/services-account/download?path=/OS_X/OS_X_Logs/sysdiagnose_Logging_Instructions.pdf

Unfortunately, I won't be able to test this issue in the near future myself.

@cbrnr Could I ask you, please, if it is not too inconvenient, to try increasing
the num_rows variable here

int num_rows=46344, num_cols=110;

For example, could you try 60000 or higher and if it ever fails use, perhaps,
binary search to pinpoint the threshold of failure? Again, I apologise for not
being able to do this test myself.

Sturla Molden · Answer 89 · Wed Sep 27 2017 03:19:22 GMT+0800 (China Standard Time)

Downloading High Sierra now, still half an hour to go. :)

Sturla Molden · Answer 90 · Wed Sep 27 2017 05:12:37 GMT+0800 (China Standard Time)

Indeed, the bug seems to be fixed in High Sierra 😃

Clemens Brunner · Answer 91 · Wed Sep 27 2017 14:41:19 GMT+0800 (China Standard Time)

@ivannz @sturlamolden it works up to about num_rows=5925000, but a little higher and it crashes due to a malloc error. I think this is expected on my 16GB machine, right?

Sturla Molden · Answer 92 · Wed Sep 27 2017 19:53:58 GMT+0800 (China Standard Time)

Hm, I can confirm that raising num_rows triggers the error again. At this point the data array is a little less than 5 GB.

Clemens Brunner · Answer 93 · Wed Sep 27 2017 20:07:00 GMT+0800 (China Standard Time)

And with MKL it actually works with these large matrices... So 👎 for this "fix"...

Emilio Ramirez · Answer 94 · Fri Dec 01 2017 00:05:44 GMT+0800 (China Standard Time)

@cbrnr how do you use MKL on macos sierra 10.12.6? I try to install it with pip install numpy-mkl but pip return an error Could not find a version that satisfies the requirement numpy-mkl. It's weird because when I search numpy-mkl with pip is listed 😖

Any ideas?

Clemens Brunner · Answer 95 · Fri Dec 01 2017 00:19:01 GMT+0800 (China Standard Time)

I use it with Anaconda, but I'd be highly interested if it is possible to install it so that I can use MKL with my Homebrew Python.

Chris Rorden · Answer 96 · Sun Jan 10 2021 21:01:38 GMT+0800 (China Standard Time)

@ivannz and @matthew-brett the sample C code provided with this issue shows no problem on the Apple M1. The Apple Accelerate frameworks are able to leverage the dedicated AMX hardware on the M1. This can provide superior performance for some functions relative to openBLAS.

Will SciPy and numpy ever reconsider using Accelerate? I think the strongest argument on the SciPy page is the opaque support Apple has for their tools, making discovery and resolution of bugs hard to track. However, it would be great if the Apple Developer Ecosystem Engineering team could be engaged with core tools like SciPy and numpy in the same way they have helped port Intel's Embree to Apple Silicon.

Clemens Brunner · Answer 97 · Sun Jan 10 2021 21:46:01 GMT+0800 (China Standard Time)

I'd be 💯 for this! Can we tag someone at Apple who might be able to chime in?

Ralf Gommers · Answer 98 · Sun Jan 10 2021 22:14:50 GMT+0800 (China Standard Time)

Will SciPy and numpy ever reconsider using Accelerate?

If Apple is interested and addresses the issues outlined at https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate, then of course we can reconsider. We support MKL as well which is not open source, because Intel maintains it well and when there's an issue we have actual humans to talk to.

think the strongest argument on the SciPy page is the opaque support Apple has for their tools, making discovery and resolution of bugs hard to track.

Accelerate being stuck on a too-low version of LAPACK (3.2.1 last time I checked) is also a real blocker. Our minimum is 3.4.1 at the moment I believe, and we may gradually increase that minimum. A lowest version that's from 2012 isn't too much to ask.

However, it would be great if the Apple Developer Ecosystem Engineering team could be engaged with core tools like SciPy and numpy in the same way they have helped port Intel's Embree to Apple Silicon.

I'd be 💯 for this! Can we tag someone at Apple who might be able to chime in?

@lawrence-danna-apple given that you worked on M1 support for Python packaging, maybe you have an opinion on this topic or know who to ping?

Sturla Molden · Answer 99 · Sun Jan 10 2021 22:22:55 GMT+0800 (China Standard Time)

The main issues are:

Accelerate is not fork safe due to the GCD (libdispatch) threadpool.
Accelerate does not support 64-bit integers in BLAS

The LAPACK version in Accelerate is not really an issue, as we can build LAPACK from Fortran sources on top of BLAS (like we do for OpenBLAS). Accelerate also uses f2c ABI, which is annoying, but not a complete blocker.

Ilhan Polat · Answer 100 · Sun Jan 10 2021 22:30:03 GMT+0800 (China Standard Time)

While I wholeheartedly agree that M1 users should be able to join the number crunching community if accelerate is overhauled, however, I am not sure whether apple would bother with it since it doesn't just have blas/lapack but all kinds of extras in that framework that needs redeveloping.

Since even LAPACK devs joined the fast GitHub track and started a much quicker release cycle, a turtle-speed and coldwar-responses won't cut it for us to enable a maintenance effort so it has many meta issues as well as technical ones.