scipy / scipy

SciPy library main repository

Home Page:https://scipy.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Memory Allocation Fault in LU factorization

ivannz opened this issue · comments

This issue was encountered here in Scikit. In short, when trying to perform linear dimensionality reduction for data analysis purposes, the script fails abruptly with a memory allocation error.

After some debugging, I managed to pin down the possible source of the problem to the LU factorization on line line 258 in utils/extmath.py. The fault occurs inside the decomposition in linalg.lu (specifically in line 185 of scipy/linalg/decomp_lu.py): the execution flow passes through the f2py wrapper of dlu_c and fails somewhere inside dgetrf (line 27).

Below is a simple reproducing snippet (on OS X 10.10.5), although LU-decomposing a matrix such as this makes little sense:

import numpy as np

from scipy import linalg

# changing 46336 to 46335 avoids the fault
obj = np.ones((46336, 110), dtype=np.float)
Q, _ = linalg.lu(obj, permute_l=True)

In python 3.6.0 (and in 2.7.13) it fails with

python(723,0x7fff77404300) malloc: *** mach_vm_map(size=18446744065123717120) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

The value 18446744065123717120 is FFFF FFFE 003E 9000 in hex looks like a signed extension of an unsigned int.

I used the following setup: python 3.6.0 (default, Dec 24 2016, 08:02:28) (in virtualenv), default numpy ('1.12.0') from pip. I tested both the latest build from git and the pip version ('0.18.1') of scipy.
System information:

System Version: OS X 10.10.5 (14F2109)
Kernel Version: Darwin 14.5.0

No manual linkage to specific linear algebra libraries was done, since according to the reference numpy/scipy automatically links with the Apple's Accelerate/vecLib linear algebra libraries.

Judging from the crash report form python 2.7.13 (crash_report.txt), something odd is going on inside Apple's Accelerate.

Python 2.7.13 (default, Dec 18 2016, 07:03:34) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin

cc: @lesteve

Assuming the data is real valued, could you please pass the same data directly to linalg.lapack.dgetrf?

Yes, np.ones(...) creates a matrix of real 1.0 values (double). This example fails with exactly the same fault:

import numpy as np

from scipy import linalg

# changing 46336 to 46335 avoids the fault
obj = np.ones((46336, 110), dtype=np.float64, order="F")
lu_, piv_, info_ = linalg.lapack.dgetrf(obj)

Changing np.float64 to np.float32 does not solve the issue.

The last 3 calls before the fault in the stack traces in the new crash report are identical to the ones in the previous crash report.

In case it helps, from the scikit-learn issue the problem was not happening within a conda environment using MKL, so it may be an OpenBLAS problem on OSX.

Indeed, on my machine I have no issues.

untitled

Yeah this is a OSX-specific problem than only happens when using OpenBLAS. I have to say I am not sure how to further debug the problem and I am hoping someone here will have some suggestions.

I am sure this has everything to do with Apple's accelerate, but I am unable to go any further. The following C++ code behaves identically to the python snippets above.

#include <math.h>

// include the clapack headers
#include <Accelerate/Accelerate.h>

int main (int argc, char const *argv[])
{

    int num_rows=46336, num_cols=110;

    // create data array for our matrix. Will be a vector that is num_rows*num_cols elements
    // to access A(r,c) use A[r+c*num_rows], in otherwords major column ordering. {r1c1,r2c1,r3c1...}
    double *A = new double[num_rows*num_cols];
    for(int row=0; row<num_rows; row++) {
        for(int col=0; col<num_cols; col++) {
            A[row+col*num_rows] = 1.0;
        }
    }

    __CLPK_integer m, n, lda, info;
    m=lda=num_rows;
    n=num_cols;

    __CLPK_integer * ipiv = new __CLPK_integer[(num_rows>num_cols?num_cols:num_rows)];

    dgetrf_(&m, &n, A, &lda, ipiv, &info);

    return 0;
/* g++ file.cpp -o file -framework accelerate */
}

This should be compiled with g++ file.cpp -o file -framework accelerate on a recent Mac. Changing 46336 to 46335 again removes the fault.
dgetrf on Netlib LAPACK

Closing (looks like accelerate bug). The discussion can be continued, but I guess the simplest fix is to not use accelerate.

I guess the simplest fix is to not use accelerate.

I was able to reproduce the original scikit-learn issue using OpenBLAS within an Anaconda environment. Was I somehow still using accelerate?

Sorry, I understood your comment above as that it did not occur with non-accelerate environment.
You can see where the problem is by running it under debugger and seeing whether the backtrace points to openblas or accelerate.

Nevertheless, you should try a similar minimal program as above. If it fails, then it's openblas/lapack issue.

I assumed that the scipy wheels available on PyPI were built against OpenBLAS as is the case on Linux. I was wrong, on OSX the wheels are built against Accelerate, at least if I believe the output of otool -L on scipy/linalg/_fblas*.so. Pinging @matthew-brett just to confirm this.

I could run the Python snippet with OpenBLAS using conda-forge without error.

All in all it seems like an Accelerate-specific problem.

Yes, the OSX wheels are build with Accelerate. What version of OSX are you running?

Yes, the OSX wheels are build with Accelerate

Thanks for confirming this, which I was not aware of.

@ivannz has opened a bug with Apple and will let us know when he gets an answer.

For completeness:
The version of OSX I am using: OSX 10.12.1, Darwin 16.1.0.

@ivannz's version is different though, so it looks like it is not restricted to only one OSX version:

System Version: OS X 10.10.5 (14F2109)
Kernel Version: Darwin 14.5.0

Same output on

System Version: OS X 10.9.5 (13F1911)
Kernel Version: Darwin 13.4.0

and

      System Version: OS X 10.11.5 (15F34)
      Kernel Version: Darwin 15.5.0

and

      System Version: OS X 10.11.6 (15G1217)
      Kernel Version: Darwin 15.6.0

but no such error on:

      System Version: Mac OS X 10.6.8 (10K549)
      Kernel Version: Darwin 10.8.0

@ivannz - would you mind maintaining a copy of your report and any replies from Apple over at http://www.openradar.me ?

Certainly! Here is the link to Open Radar rdar://30868533.

Is it possible to build the macOS wheels without the Accelerate framework? I mean, are there any alternative libraries pre-installed such as OpenBLAS? Or are we really stuck with Apple's ancient and buggy numerical libraries (with no chance for a quick fix)?

I'm hopeful that once the discussion in #6051 is finalized, we will be back on track with the freedom of choosing BLAS or LAPACK flavor. But it seems that we are stuck until v1.0 with Accelerate

Has anyone tested if the issue still exists in the latest macOS High Sierra beta? I'm hoping that Apple has noticed the Accelerate bug and fixed it silently...

Seems this was closed incorrectly, so reopening.

The two examples don't crash for me on OS X 10.12 with scipy master built against Accelerate. Not sure if that's a change in scipy or in Accelerate.

I just built numpy and scipy from their master branches, and the code given in the description above crashes on my late 2013 Macbook running OS X 10.9.5, with Python 3.5.2:

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.14.0.dev0+9afb77a'

In [3]: import scipy

In [4]: scipy.__version__
Out[4]: '1.0.0.dev0+85287f9'

In [5]: from scipy import linalg

In [6]: obj = np.ones((46336, 110), dtype=np.float)

In [7]: Q, _ = linalg.lu(obj, permute_l=True)
python(15120,0x7fff74a9d310) malloc: *** mach_vm_map(size=18446744065123717120) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Segmentation fault: 11

If I use numpy from Anaconda (numpy 1.12.1), I don't get the crash.


Update: Now that I've done more than just skim the full thread, I guess this comment is old news.

@rgommers the C++ example as well as all Python examples (using scipy master) still crash on macOS 10.12.6 (latest version released on July 19). If anyone has the preview of macOS 10.13 (High Sierra), it would be great to hear if the problem has been fixed by Apple.

In any case, I suggest to add a test to scipy to test against this issue - WDYT?

@cbrnr Could you also please add the result of scipy.linalg.lapack.ilaver()

@pv yes - unless Apple fixes the bug, which might have already happened in 10.13 Preview...

and the only way Scipy can fix it is by not using Accelerate.

@pv Ahem... #6051 anyone ? :-D

>>> scipy.linalg.lapack.ilaver()
(3, 2, 1)

Yep, adding this test basically means that scipy is broken on macOS - I don't know how to best handle this. I guess such questions had better be discussed in #6051...

I think #6051 is saturated and does not reflect the intent the discussion happened towards the end. I don't have any experience with Accelerate to construct meaningful sentences.

@cbrnr Would you be interested in making separate issue for this ? (By the way, ouch for the LAPACK version)

I can also try to summarize that thread if nobody wants to take it on

@ilayn that would be great. What I would suggest is not to do that in a comment but in a short markdown/rest doc (put it in a gist maybe or on the wiki). That way others can contribute and we can get something with the pros/cons for each option properly summarized.

@ilayn it would be great if you get this doc started - I'd be happy to contribute, but I don't have time to start this off at the moment.

@rgommers Is this gist something that you can work with? I hope I understood correctly.

Thanks @ilayn, that's a good start. Motivation is good as is, the second half is a bit incomplete. I would suggest this structure below the Motivation section, working it out in a bit more detail):

Options
--------

Keep supporting Accelerate as we do now
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*Pros*: least effort, building/installing SciPy on OS X is easy.

*Cons*: cannot make use of recently introduced functionality in LAPACK (e.g. gh-xxxx, ) or fix bugs that are due to Accelerate (e.g. gh-7131)

Drop Accelerate support completely
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*Pros*: can move to LAPACK 3.y.z, can remove Accelerate patching in ``scipy._build_utils``

*Cons*: some people want to use Accelerate for performance reasons (link), we force users to install another BLAS/LAPACK which is not easy right now. This could be addressed by packaging OpenBLAS as a wheel, but that's a significant amount of work too.

In particular, these are the questions to address (copy from https://github.com/scipy/scipy/pull/6051#issuecomment-290933088)

Keep using Accelerate BLAS but shadow LAPACK
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
this one is a bit harder to summarize, need to read through gh-6051. IIRC can be done in a couple of ways
- Using ``vecLibFort``, see https://github.com/scipy/scipy/pull/6051#issuecomment-225431271
- Just bundling missing + broken routines: https://github.com/scipy/scipy/pull/6051#issuecomment-225431600
- <I think there's more flavors>

Other considerations
---------------------

Which minimum LAPACK version to support
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See https://github.com/scipy/scipy/pull/6051#issuecomment-265656371

ABI issues with shadowing
~~~~~~~~~~~~~~~~~~~~~
MKL also uses a g77 ABI, see https://github.com/scipy/scipy/pull/6051#issuecomment-225431914 

Rejected alternatives
---------------------
Dropping all of LAPACK into SciPy (see https://github.com/scipy/scipy/pull/6051#issuecomment-266380190).

Sorry I'm boarding a plane in an hour and will probably not have time for more than drive-by commenting in the next few weeks. But I think if you can work things out as far as possible along these lines, then some of the people on gh-6051 will be happy to fill in the remaining blanks.

While I would gladly maintain this doc somewhere, I'm really not into this topic at the required level. So I will leave it in an issue and maintainers with edit priviledges can fix/modify further.

It would be great if everyone could just add comments and edits instead of you having to edit the document all the time. Maybe a separate issue is a good starting point for that. I'd just highlight that scipy is currently broken on macOS due to it depending on Accelerate by default. Other open-source alternatives (OpenBLAS) should be highlighted more because these issues just can't happen. MKL would be another alternative, but since it is also proprietary the same issues might occur (and probably already have occurred at some point).

Suggest the wiki (https://github.com/scipy/scipy/wiki). You should be able to make a new page there - much easier for multiple people to edit.

Ah perfect. Wiki it is.

Sad as it is, it is time to drop Accelerate. Integer overflow is even present in fundamental BLAS functions like *GEMM. This means that even the proposed solutions will never work properly. Even if we build LAPACK on top of Accelerate's CBLAS it will still be unstable. There is no chance of Apple ever updating Accelerate or fixing the known bugs. Dropping Accelerate support is the only sane solution, even for NumPy. We need to focus on supporting MKL, OpenBLAS, and ATLAS.

@sturlamolden I've added your example to the wiki document. That's indeed unfortunate. Should I also create an issue on the NumPy side too?

Here is an example of NumPy failing due to integer overflow in Accelerate *GEMM: numpy/numpy#5533

Also note that ATLAS has this problem, it is not Accelerate specific. It is due to the helper functions catlas_dset et al.

Here is an example of scikit-learn running into this issue:
scikit-learn/scikit-learn#4197

I am not completely sure that I've read the thread carefully enough, but just in case this is useful:

On macOS 10.12.6 (i.e., latest non-beta Sierra, as of 24 July 2017), with Python 2.7.13 or Python 3.6.2 framework builds from python.org and pip-installed numpy and scipy (which should therefore use Accelerate) the original python script in the message from @ivannz at the top of this thread runs without crashing.

Similarly, the g++ code works fine when compiled with Xcode g++ [really Apple LLVM version 8.1.0 (clang-802.0.42)].

I came to this following the thread suggesting dropping accelerate which seems like a terrible to me... I think you will lose a lot of users in the scientific community who need this functionality but don't have the time to deal with extra installations...

@defjaf Please have a look at the arguments in the wiki page --> https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate

There are many problems provided which is due to this framework and just look at the examples given by sturlamolden above your comment. It is not an installation problem. OSX support is not being dropped. It's the library that the build will link to, that changes. I don't know who we are or which users will be lost but all will benefit from this.

As being an ex-academic all the way, scientific community has all the time in the world to download <50 MBs and run two more lines of commands on the terminal to have a more stable environment.

@ilayn, sorry, I misunderstood whether @sturlamolden was citing new incidents of the original issue cropping up rather than unrelated issues.

However, for what it's worth, numpy issue 5533 does not crash my machine with the setup as above (I did not read that thread further than the first message, so apologies if that is not the "real" problem.)

In any case, I do agree that we should have an overall working set of routines...

On the other hand, I can tell you from first-hand experience (both with my own machine and attempting to debug others' setups) that external dependences are very often a PITA to handle, and are rarely trivial, due to conflicts, upgrades, permissions, environment variables, etc. And, yes, scientists should be savvy enough to deal with this but often do not have the requisite mental bandwidth available after the first failure... So if we do abandon Accelerate, please do make the outcome as frictionless as possible...

(Also, apologies that this isn't the right thread for some of these comments...)

I'm building scipy on Windows machines or rather failing at it continuously I should say 😃 So, please trust me when I say that I share your concerns wholeheartedly.

For what it is worth, the LU code in question does not fail on my system either (macOS 10.12.6, Python 3.6.1, scipy 0.18.1). This illustrates another issue with Accelerate, that fixes to the framework are distributed as OS upgrades, and not backported to previous OS versions.

However, it is still the case that the LAPACK version is old because the interface is frozen on CLAPACK which is abandonware.

I do not have enough RAM to check if the *GEMM integer overflow due to code from ATLAS is still present, but is has a simple fix: raise an exception is arrays have more than 2147483648 and we are using Accerate or ATLAS.

I do not advocate replacing Accelerate with ATLAS for the macOS wheels, though. ATLAS is equally buggy as Accelerate, though less performant. It will have to be OpenBLAS. Or, we could accept the quirks of Accelerate but build a recent LAPACK on top of it.

@defjaf @sturlamolden I have macOS 10.12.6 and the C++ program (compiled with Apple LLVM version 8.1.0 (clang-802.0.42)) does not work (i.e. it crashes with a segfault). Same for the Python examples.

Oh, this is embarassing. It seems my default SciPy was built with MKL... That is why it didn't segfault...

Rebuilding with clang and Accelerate and it does indeed fail.

(Building with GCC does not even work, because of errors in veclib.h. I guess that must have changed after the upgrade to macOS Sierra...)

Incidentally, openblas is now available in core Homebrew.

OK, I'm confused. My installation definitely doesn't crash -- I use the wheels that get downloaded and installed with pip install numpy scipy (either pip2 or pip3) -- I know that there is a way to check how numpy and scipy are compiled but I don't remember the incantation...

Since the underlying issue is with Accelerate, the C example by @ivannz should also crash - it is strange that this is working for you. Could you double-check?

@cbrnr Yes, it absolutely runs on my machine, which has the latest macOS Sierra 10.12.6 and Xcode (as I think @sturlamolden points out, the issue may be that Apple distributes fixes through OS updates). I have tried g++, clang++ and even some homebrew llvm versions.

But does Apple distribute fixes only to certain users/machines? I'm on the same macOS and Xcode versions...

Just to rule one thing out, do you have Anaconda installed? If so, please remove it (or at least everything on the path). You can never be sure what they include in their distribution (I wouldn't be surprised if they included compilers).

No, this is the latest Python.org framework installs of 2.7.* and 3.6.* with pip installation of packages. I suppose it is possible that these things could vary on different machines, but I have tried on both a late 2014 Retina 5k iMac and a late 2013 Retina MacBook Pro.

which g++?

$ which g++
g++ is /usr/bin/g++

$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer//usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

and just to prove that I'm actually running the code (nb I added some cout << ... to check):

$ g++ ./accelcrash.cpp -o accelcrash -framework accelerate
$ ./accelcrash 
about to do dgetrf_
done
$

Same here (only without the strange double slash in the prefix argument). Maybe for whatever reason this particular number 46336 does not cause an overflow on your machine, but some other number does. It is pretty unlikely that you have a different version of Accelerate if we're on the same OS and Xcode versions.

The whole back and forth only goes to show that Accelerate (and ATLAS) seem to have a lot of bugs that are difficult to reproduce (e.g. NumPy 5533 works for me) and/or won't be fixed in a while, so choosing OpenBLAS or MKL as primary options seems to be a good idea. And maybe I should just install Linux on my MacBook...

@defjaf That is not really g++ but actually a symlink to clang++. Here is roughly what you should see with a real g++:

sturla$ which g++
/usr/local/gfortran/bin/g++

sturla$ g++ --version
g++ (GCC) 6.3.0
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@sturlamolden Apple doesn't ship gcc - all gcc binaries point to clang alternatives.

@cbrnr Yes, that is what I meant. GCC must be installed separately.

OK, but on my machine the example crashes even when using clang. How is the compiler choice relevant to this discussion (sorry, the thread is getting longer and longer)?

sturla$ clang++ file.cpp -o file -framework accelerate
sturla$ ./file
file(23769,0x7ffff128e3c0) malloc: *** mach_vm_map(size=18446744065123717120) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Segmentation fault: 11

Changing number of rows to 46335:

sturla$ clang++ file.cpp -o file -framework accelerate
sturla$ ./file
sturla$ _

Compiler version:

sturla$ clang++ --version
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Yes, I thought this was the point - it crashes with the default compiler (but not on @defjaf's setup, which also uses clang). What about gcc in this context? It doesn't work with accelerate and spits out compilation errors.

Edit: OK, g++ requires -flax-vector-conversions. No segfault.

What did I just do? Now the C example works with both clang++ and g++... 😱 Did I install any library update with Homebrew? Could this be not related to Accelerate after all?

@cbrnr what about the python version?

sturla$ clang++ -o file file.cpp -flax-vector-conversions -framework accelerate
sturla$ ./file
file(24274,0x7ffff128e3c0) malloc: *** mach_vm_map(size=18446744065123717120) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Segmentation fault: 11

@defjaf Python works too. 😕 I have no idea what's going on...
@sturlamolden I meant -flax-vector-conversions is required to compile with g++.

Perhaps this verbose compiler output might help find any differences in our setups? (Note that the interesting lines are the long ones starting with quote marks where the compilation and linking actually happens...)

$ clang++ -v ./accelcrash.cpp -o accelcrash -framework accelerate
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang" -cc1 -triple x86_64-apple-macosx10.12.0 -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name accelcrash.cpp -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu penryn -target-linker-version 278.4 -v -dwarf-column-info -debugger-tuning=lldb -resource-dir /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.1.0 -stdlib=libc++ -fdeprecated-macro -fdebug-compilation-dir /Users/jaffe/home/python -ferror-limit 19 -fmessage-length 80 -stack-protector 1 -fblocks -fobjc-runtime=macosx-10.12.0 -fencode-extended-block-signature -fcxx-exceptions -fexceptions -fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics -o /var/folders/nc/n7d98d0x46g0h5yyb4bc1gv40000gv/T/accelcrash-b0a9ed.o -x c++ ./accelcrash.cpp
clang -cc1 version 8.1.0 (clang-802.0.42) default target x86_64-apple-darwin16.7.0
ignoring nonexistent directory "/usr/include/c++/v1"
#include "..." search starts here:
#include <...> search starts here:
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1
 /usr/local/include
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.1.0/include
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include
 /usr/include
 /System/Library/Frameworks (framework directory)
 /Library/Frameworks (framework directory)
End of search list.
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld" -demangle -lto_library /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/libLTO.dylib -no_deduplicate -dynamic -arch x86_64 -macosx_version_min 10.12.0 -o accelcrash /var/folders/nc/n7d98d0x46g0h5yyb4bc1gv40000gv/T/accelcrash-b0a9ed.o -framework accelerate -lc++ -lSystem /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.1.0/lib/darwin/libclang_rt.osx.a

@defjaf my output looks identical. Note that now all of a sudden it works (just like on your machine) and I can't go back to when it didn't.

Yep, this was more directed @sturlamolden: what about you, since you're still seeing the crash(es)?...

Apparently, you have accidentally Microsoft-ified your machines. That's my default daily headscratching position. Things work, based on planets' alignment and weather forecast.

sturla$ $ clang++ -v ./file.cpp -o file -flax-vector-conversions -framework accelerate
-bash: $: command not found

Oops...

sturla$ clang++ -v -o file file.cpp -flax-vector-conversions -framework accelerate
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang" -cc1 -triple x86_64-apple-macosx10.12.0 -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name file.cpp -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu penryn -target-linker-version 278.4 -v -dwarf-column-info -debugger-tuning=lldb -resource-dir /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.1.0 -stdlib=libc++ -fdeprecated-macro -fdebug-compilation-dir /Users/sturla/development/lutest -ferror-limit 19 -fmessage-length 80 -stack-protector 1 -fblocks -fobjc-runtime=macosx-10.12.0 -fencode-extended-block-signature -fcxx-exceptions -fexceptions -fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics -o /var/folders/yl/_1z15lyj0w95_0tpv51sh47m0000gn/T/file-b0d43d.o -x c++ file.cpp
clang -cc1 version 8.1.0 (clang-802.0.42) default target x86_64-apple-darwin16.7.0
ignoring nonexistent directory "/usr/include/c++/v1"
#include "..." search starts here:
#include <...> search starts here:
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1
 /usr/local/include
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.1.0/include
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include
 /usr/include
 /System/Library/Frameworks (framework directory)
 /Library/Frameworks (framework directory)
End of search list.
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld" -demangle -lto_library /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/libLTO.dylib -no_deduplicate -dynamic -arch x86_64 -macosx_version_min 10.12.0 -o file /var/folders/yl/_1z15lyj0w95_0tpv51sh47m0000gn/T/file-b0d43d.o -framework accelerate -lc++ -lSystem /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.1.0/lib/darwin/libclang_rt.osx.a

@ilayn 🤣 in reality, we should all be using Linux for scientific computing (and actually I am, but just not on my MacBook)...

@sturlamolden No differences other than filenames...

@ilayn It certainly feels that way after the upgrade to Sierra. Now my gfortran does not work because as does not understand the -m flag.

@sturlamolden plain -m? This flag does not exist on Linux either.

Anyway, this discussion is digressing too far from the main issue here. But it's clear that there are some major issues popping up here and there. I'm not sure this is helping in the decision how SciPy should proceed with official macOS builds.

@sturlamolden @cbrnr I agree (but nb. gfortran from homebrew works fine on my machine... perhaps this points to something brittle in your setup?)

I guess Apple hasn't fixed the issue. On the latest Sierra

      System Version: macOS 10.12.6 (16G29)
      Kernel Version: Darwin 16.7.0

I get the segfault for these new constants (just used the binary search from 46335 upward to find a new failure threshold):

    int num_rows=46344, num_cols=110;

Changing 46343 to 46344 or any higher value also causes a segfault. Changing 110 to a higher value, while keeping num_rows = 46343 does not cause a segfault.

I replicated this behaviour both with the standalone C code and the python example at the top of the thread.

PS: There has been no activity around my bug report to Apple so far.

Closing, there's nothing we can do aside from dropping Accelerate.

I just upgraded to macOS High Sierra (10.13) and the bug seems to have been fixed by Apple. All examples above (both Python and C++) work! 🎉

Today I got an email from the developers, asking to test if the bug persists in High Sierra 10.13 GM.

This is a follow-up regarding regarding Bug ID# 30868533.

Please verify this issue with the macOS 10.13 GM and update your bug report at https://bugreport.apple.com/ with your results.

macOS High Sierra 10.13 GM (17A365)
https://developer.apple.com/download/
Posted Date: Sep 25, 2017

If the issue persists, please attach a new sysdiagnose captured in the latest build and attach it to the bug report.

macOS sysdiagnose Instructions:
https://developer.apple.com/services-account/download?path=/OS_X/OS_X_Logs/sysdiagnose_Logging_Instructions.pdf

Unfortunately, I won't be able to test this issue in the near future myself.

@cbrnr Could I ask you, please, if it is not too inconvenient, to try increasing
the num_rows variable here

int num_rows=46344, num_cols=110;

For example, could you try 60000 or higher and if it ever fails use, perhaps,
binary search to pinpoint the threshold of failure? Again, I apologise for not
being able to do this test myself.

Downloading High Sierra now, still half an hour to go. :)

Indeed, the bug seems to be fixed in High Sierra 😃

@ivannz @sturlamolden it works up to about num_rows=5925000, but a little higher and it crashes due to a malloc error. I think this is expected on my 16GB machine, right?

Hm, I can confirm that raising num_rows triggers the error again. At this point the data array is a little less than 5 GB.

And with MKL it actually works with these large matrices... So 👎 for this "fix"...

@cbrnr how do you use MKL on macos sierra 10.12.6? I try to install it with pip install numpy-mkl but pip return an error Could not find a version that satisfies the requirement numpy-mkl. It's weird because when I search numpy-mkl with pip is listed 😖

Any ideas?

I use it with Anaconda, but I'd be highly interested if it is possible to install it so that I can use MKL with my Homebrew Python.

@ivannz and @matthew-brett the sample C code provided with this issue shows no problem on the Apple M1. The Apple Accelerate frameworks are able to leverage the dedicated AMX hardware on the M1. This can provide superior performance for some functions relative to openBLAS.

Will SciPy and numpy ever reconsider using Accelerate? I think the strongest argument on the SciPy page is the opaque support Apple has for their tools, making discovery and resolution of bugs hard to track. However, it would be great if the Apple Developer Ecosystem Engineering team could be engaged with core tools like SciPy and numpy in the same way they have helped port Intel's Embree to Apple Silicon.

I'd be 💯 for this! Can we tag someone at Apple who might be able to chime in?

Will SciPy and numpy ever reconsider using Accelerate?

If Apple is interested and addresses the issues outlined at https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate, then of course we can reconsider. We support MKL as well which is not open source, because Intel maintains it well and when there's an issue we have actual humans to talk to.

think the strongest argument on the SciPy page is the opaque support Apple has for their tools, making discovery and resolution of bugs hard to track.

Accelerate being stuck on a too-low version of LAPACK (3.2.1 last time I checked) is also a real blocker. Our minimum is 3.4.1 at the moment I believe, and we may gradually increase that minimum. A lowest version that's from 2012 isn't too much to ask.

However, it would be great if the Apple Developer Ecosystem Engineering team could be engaged with core tools like SciPy and numpy in the same way they have helped port Intel's Embree to Apple Silicon.

I'd be 💯 for this! Can we tag someone at Apple who might be able to chime in?

@lawrence-danna-apple given that you worked on M1 support for Python packaging, maybe you have an opinion on this topic or know who to ping?

The main issues are:

  • Accelerate is not fork safe due to the GCD (libdispatch) threadpool.
  • Accelerate does not support 64-bit integers in BLAS

The LAPACK version in Accelerate is not really an issue, as we can build LAPACK from Fortran sources on top of BLAS (like we do for OpenBLAS). Accelerate also uses f2c ABI, which is annoying, but not a complete blocker.

While I wholeheartedly agree that M1 users should be able to join the number crunching community if accelerate is overhauled, however, I am not sure whether apple would bother with it since it doesn't just have blas/lapack but all kinds of extras in that framework that needs redeveloping.

Since even LAPACK devs joined the fast GitHub track and started a much quicker release cycle, a turtle-speed and coldwar-responses won't cut it for us to enable a maintenance effort so it has many meta issues as well as technical ones.