Syntax alignment with numpy
fabian-sp opened this issue · comments
Hi, I just found this repository as I need to use a sparse data format inside numba functions. In my exisiting code using numpy arrays I mainly need row indexing and dot products.
Is there a possibility to use csr
with the same syntax for these functions? So having sth like A[i,:]
which works when A
would be a standard numpy array but also when A
is a csr.CSR
object? This would help me to avoid a lot of duplicate code.
Thank you for your help!
Thanks for your interest in the package, and sorry for missing this issue in my notifications! I don't see any reason why that wouldn't be possible - just a matter of implementing it. If I have time I might look at it; also happy to review a pull request.
I should clarify a bit further. It is definitely possible to have this available in Python functions. I do not know if it will be possible to make that syntax available in Numba-compiled functions - will need to review Numba's support for overloading magic methods. I see that Numba 0.56 added that support for jitclasses, but CSR does not use a jitclass (because jitclasses weren't stable enough and didn't support everything I needed).
Hi, thank your for your reply. So I have worked on this a bit in the meantime and what I basically would need is a functionality that gives me a subset of rows, sth like A[S,:]
where S
is a list of integers. I currently do this with your .row
function and looping over S
but I guess that this is not very efficient. I later on need to do only matrix-vector products with A[S,:]
.
Thanks! Looks like we need support for a few cases:
- Single integer (easy, but unclear what return type should be - should it be a dense vector or a sparse one?)
- Slice (logic already exists, but isn't exposed through the index API)
- Sequence of indices (needs a loop as you say; this can be efficiently implemented in Numba)
- Sequence of booleans (will also need a loop)
I think that covers all the common indexers.
@fabian-sp There are now row functions that support arrays of row indices, in the new 0.5 release that will come out shortly.
Specifically, .row()
now takes an ndarray
.