pydata / sparse

Sparse multi-dimensional arrays for the PyData ecosystem

Home Page:https://sparse.pydata.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can't create COO object if shape parameter is ndarray

arzdou opened this issue · comments

When creating a new sparse COO object there is an error if the shape is an ndarray instead of a list.

The error can be encountered as follows:

import numpy as np
import sparse

coords = [[0, 1, 2, 3, 4],

          [0, 1, 2, 3, 4]]

data = [10, 20, 30, 40, 50]

s = sparse.COO(coords, data, shape=np.array((5, 5)))

Which results in the following propt

File "<stdin>", line 1, in <module>
File "path\to\directory\.venv\lib\site-packages\sparse\_coo\core.py", line 244, in __init__
    if shape and not self.coords.size:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The error is produced when shape is casted into a bool, if it is a list the result will be False if it's empty and True in any other situation. For an ndarray the casting is done element by element and thus cannot be evaluated in an if statement.

The solution is just to cast the value of shape into a list before operating with it

System

  • Windows 10
  • sparse version 0.13.0
  • NumPy version 1.21.5
  • Numba version 0.55.1

I believe you are using the library incorrectly. This is the correct usage:

import numpy as np
import sparse

coords = [[0, 1, 2, 3, 4],

          [0, 1, 2, 3, 4]]

data = [10, 20, 30, 40, 50]

s = sparse.COO(coords, data, shape=(5, 5))

If you need an array's shape, pass in array.shape.

I know that the intended type for shape is a list, but there might be a situation where the variable you are trying to use is an array.
For example, loading hdf5 files returns the shape as an array and thus, when using it as an input, the exception is raised.

I might be missing something here, but perhaps you can just do sparse.COO(coords, data, shape=arr.shape) instead of, for example, sparse.COO(coords, data, shape=arr)?

If I am missing something, In your example, how would an array be interpreted as a shape? How would you get the "overall shape" of a 2-dimensional rectangular array? Perhaps what you are looking for are non-rectangular or jagged arrays, which currently arent't supported.

For what it's worth, it looks like scipy.sparse handles shapes specified as arrays:

>>> sparse.coo_matrix(([0], ([0], [0])), shape=np.array([5, 5]))
<5x5 sparse matrix of type '<class 'numpy.int64'>'
	with 1 stored elements in COOrdinate format>

I think there was a missunderstanding, what I mean by array is the type of the shape input.

In the examples we both use the shape of the matrix is 5 by 5. In your case you input the shape as the touple (5, 5) and in mine as the array ndarray([5, 5]). They both are exatly the same but when using the array an exception is raised.

This is because in line 244 of _coo\core.py there is a broadcast of shape into a bool which changes depending if shape is a touple or an array.

bool((5,5)) = True
bool(ndarray([5, 5])) = ndarray([True, True])

Transforming shape into a list will fix it.

Ah, I understand. You are looking to use a 1-D array of integers as a shape.

Yes, this is easily fixed. I'm willing to accept a PR, or I can try to do it over the weekend.

Please feel free to ping me if I don't.