lenskit / csr

Compressed sparse matrices

Home Page:https://csr.lenskit.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Syntax alignment with numpy

fabian-sp opened this issue · comments

Hi, I just found this repository as I need to use a sparse data format inside numba functions. In my exisiting code using numpy arrays I mainly need row indexing and dot products.

Is there a possibility to use csr with the same syntax for these functions? So having sth like A[i,:] which works when A would be a standard numpy array but also when A is a csr.CSR object? This would help me to avoid a lot of duplicate code.

Thank you for your help!

Thanks for your interest in the package, and sorry for missing this issue in my notifications! I don't see any reason why that wouldn't be possible - just a matter of implementing it. If I have time I might look at it; also happy to review a pull request.

I should clarify a bit further. It is definitely possible to have this available in Python functions. I do not know if it will be possible to make that syntax available in Numba-compiled functions - will need to review Numba's support for overloading magic methods. I see that Numba 0.56 added that support for jitclasses, but CSR does not use a jitclass (because jitclasses weren't stable enough and didn't support everything I needed).

Hi, thank your for your reply. So I have worked on this a bit in the meantime and what I basically would need is a functionality that gives me a subset of rows, sth like A[S,:] where S is a list of integers. I currently do this with your .row function and looping over S but I guess that this is not very efficient. I later on need to do only matrix-vector products with A[S,:].

Thanks! Looks like we need support for a few cases:

  • Single integer (easy, but unclear what return type should be - should it be a dense vector or a sparse one?)
  • Slice (logic already exists, but isn't exposed through the index API)
  • Sequence of indices (needs a loop as you say; this can be efficiently implemented in Numba)
  • Sequence of booleans (will also need a loop)

I think that covers all the common indexers.

@fabian-sp There are now row functions that support arrays of row indices, in the new 0.5 release that will come out shortly.

Specifically, .row() now takes an ndarray.