pydata / sparse

Sparse multi-dimensional arrays for the PyData ecosystem

Home Page:https://sparse.pydata.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Segmentation fault on arm64

tillea opened this issue · comments

Describe the bug
When running the test suite on arm64 architecture Python3.11 segfaults.

To Reproduce
The Debian continuous integration test is running on all Debian release architectures. While it passed for amd64 it fails on arm64 and other architectures. Feel free to check the full build log

Expected behavior
The test suite should pass on all architectures.

System

  • OS and version: Debian unstable
  • sparse version: 0.15.1
  • NumPy version: 1.24.2
  • Numba version: 0.57.1

Kind regards, Andreas.

This is likely a problem with Numba code generation -- do the tests pass with Py3.10 and below on other architectures?

The test used to pass with Py3.10. You can check the list of architectures including test logs on our CI page

Let me rephrase, do the tests pass with Python 3.11 and sparse 0.14, but Numba 0.57.1? How about Python 3.10, sparse 0.15.1 and Numba 0.57.1?

I unfortunately don't have access to an ARM64 machine, so I cannot debug this personally, and would rely on reporters to isolate the issue.

@mtsokol IIRC you had a Mac, is that Apple Silicon by any chance? Could you reproduce this bug with the software versions mentioned?

@mtsokol IIRC you had a Mac, is that Apple Silicon by any chance? Could you reproduce this bug with the software versions mentioned?

Unfortunately my Mac is an ancient MacBook Pro 2015 with Intel i7.

I've attempted to fix this in #634, please re-open if the issue isn't resolved.

Hi,
(sorry, I do not find any re-open button)
I tried tag 0.16.0a4 (not sure whether this is considered alpha??) and the problem persist. In addition I tried amd64 test which fails as well.
Kind regards, Andreas.

@tillea I just tested locally, it doesn't fail for me in a Docker container -- You might want to look at numba/numba#9109 (comment) and backporting llvm/llvm-project@2e1b838 to Debian's LLVM 14.

Relevant LLVM issue: llvm/llvm-project#61402

Andreas asked me to help out with this bug as he has new Debian project leader responsibilities. I was slowly trying to help deal with the numba side of the problems, but fell behind on understanding the llvm fix.
Currently I'm trying to the llvmlite maintainer to update llvmlite so I can release numba 0.59.1

@detrout Thank you for helping out -- Some background info from reading the Numba issue, it isn't an issue with Numba itself, but present in Debian's LLVM 14 (and release LLVM 14, IIUC). The reason it doesn't show up on Numba from PyPI or conda-forge is that they already have the LLVM patch applied in llvmlite on PyPI and LLVM 14 from conda-forge, which is why I think backporting the patch might help.

I recently ran the test suite on both an Apple Silicon Mac as well as multiple arm64-based containers trying to reproduce this, but the test suite ran fine. Can anyone, maybe @detrout, check what happens if llvmlite and numba are installed via PyPI instead of via apt? That would confirm a packaging issue, and would point to #628 (comment) being a possible cause.