GCXS slice bug

Question

GCXS slice bug

israelmcmc opened this issue 2 years ago · comments

Describe the bug
The slice operator (__getitem__) produces incorrect results in some cases for the class GCXS.

To Reproduce
I found this error working with a large matrix. This is the minimal version that reproduces the error

dok = DOK(shape = (12, 1147, 12, 32, 1147, 3, 3))
dok[1,1,1,1,1,1,1] = 5
coo = dok.to_coo()
gcxs = coo.asformat('gcxs')

print( coo[1:11, 1:2, 1:11, 1:31, 1:1146, 1:2, 1:2])
print(gcxs[1:11, 1:2, 1:11, 1:31, 1:1146, 1:2, 1:2])

Results in

<COO: shape=(10, 1, 10, 30, 1145, 1, 1), dtype=float64, nnz=1, fill_value=0.0>
<GCXS: shape=(10, 1, 10, 30, 1145, 1, 1), dtype=float64, nnz=0, fill_value=0.0, compressed_axes=(5,)>

They should be the same.

I tried to reproduce it on small arrays, but it worked just fine. On medium-size matrices however, like the following, sometimes it works and sometimes it doesn't (running exactly the same code, seems random):

dok = DOK(shape = (4,5,6,7,8,9,10))
dok[1,1,1,1,1,1,1] = 5
coo = dok.to_coo()
gcxs = coo.asformat('gcxs')

print( coo[1:3, 1:2, 1:5, 1:6, 1:7, 1:8, 1:9])
print(gcxs[1:3, 1:2, 1:5, 1:6, 1:7, 1:8, 1:9])

Sometimes results in :

<COO: shape=(2, 1, 4, 5, 6, 7, 8), dtype=float64, nnz=1, fill_value=0.0>
<GCXS: shape=(2, 1, 4, 5, 6, 7, 8), dtype=float64, nnz=1, fill_value=0.0, compressed_axes=(0,)>

And if I keep executing the cell, sometimes it outputs:

<COO: shape=(2, 1, 4, 5, 6, 7, 8), dtype=float64, nnz=1, fill_value=0.0>
<GCXS: shape=(2, 1, 4, 5, 6, 7, 8), dtype=float64, nnz=0, fill_value=0.0, compressed_axes=(0,)>

Expected behavior
COO and GCXS should have consistent results. In particular for the examples above, the result should have nnz=1 always (COO gets it right).

System

OS and version: macOS 12.3.1
sparse version (sparse.__version__): 0.13.0
NumPy version (np.__version__): 1.21.5
Numba version (numba.__version__): 0.53.0

Additional context
None

James Webber · Answer 1 · Mon Nov 21 2022 04:13:43 GMT+0800 (China Standard Time)

Can confirm that this is happening on master.

I was curious whether it might have to do with the fact that asformat('gcxs') is trying to guess the best axis to compress, so I started passing different kwargs. I found it was somewhat random but mostly occurred regardless of the compressed axis, but with compressed_axes=(1,) it often crashes. 🙃 Not with 4, though.

It does print the first line though so it's indeed in __getitem__, not in conversion...

James Webber · Answer 2 · Mon Nov 21 2022 04:28:09 GMT+0800 (China Standard Time)

Based on the crash I'm guessing this is in get_slicing_selection because that's a numba-fied function.

edit: Interestingly, if I disable the use of that function I get way crazier errors: nnz goes into up to 60-80 somehow. Maybe there's an assumption in get_array_selection that I'm missing but that seems wrong.

James Webber · Answer 3 · Mon Nov 21 2022 05:20:14 GMT+0800 (China Standard Time)

I was wrong, I think it's actually somewhere in sparse.convert.convert_to_flat (at least, the crash is coming from there, not sure about the bug)

James Webber · Answer 4 · Mon Nov 21 2022 08:01:52 GMT+0800 (China Standard Time)

Sunday Funday 🎉

I think I tracked this down but I still don't fully understand why it's happening so I can't say for sure.

The issue is in convert.py. That block is checking if the current position is at the end of a block, and if so it resets the counter and increments the higher dimension. However it was only checking the first case, not multi-dimensional cases, and this led to out-of-bounds accessing in some cases and (I guess?) incorrect indexing in other cases.

The fix I have is simple but I am not sure it is complete:

for i in range(operations):
-   if i != 0 and positions[pos] == increments[pos].shape[0]:
+   while i != 0 and positions[pos] == increments[pos].shape[0]:
        positions[pos] = 0
        pos -= 1
        positions[pos] += 1
-       pos += 1
+ pos = len(increments) - 2  # resetting to the initial value

I'll open a PR but probably @hameerabbasi should review and maybe there is a better solution.

Hameer Abbasi · Answer 5 · Mon Nov 21 2022 09:41:46 GMT+0800 (China Standard Time)

I think @daletovar wrote that code, he's the best person to review.