pydata / sparse

Sparse multi-dimensional arrays for the PyData ecosystem

Home Page:https://sparse.pydata.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider MatRepr for `__repr__` and `_repr_html_`

alugowski opened this issue · comments

I've noticed that there appears to be no native way to print/display a sparse matrix.
The docs always convert to dense for display, and the native methods only emit some metadata.
todense can be troublesome when the matrix is large, and the result does not visualize sparsity.

I went ahead and added pydata/sparse to matrepr.

You can see it in action with pydata/sparse matrices (1D, 2D, 3D) in this Jupyter notebook.

For the Python REPL, there is a simple monkey-patch that replaces __repr__() and prints matrices in an interactive Python shell: import matrepr.patch.sparse

Is this something of interest to the pydata/sparse community?

Example of a random 2D COO matrix:
image

Would this require an additional dependency? If not it's very welcome. If it is something that's going to require a dependency, I'd make it optional.

I'm definitely on board with the idea of a better repr though.

Apart from matrepr itself, the string formatter has a tabulate dependency. HTML and Latex formatters have no additional dependencies.

For reference, this should be enough:

def _repr_html_(self):
    from matrepr.adapters.sparse_driver import PyDataSparseDriver
    return to_html(PyDataSparseDriver.adapt(self), notebook=True)

def __repr__(self):
    from matrepr.adapters.sparse_driver import PyDataSparseDriver
    # Enable terminal width detection
    return to_str(PyDataSparseDriver.adapt(self), width_str=0, max_cols=9999)

Would you be willing to make a PR that reverts back to the old behaviour in the absence of matrepr?

Yes, I'll submit one soon.

See PR: #605

Here is one question: should empty cells remain empty, or display the array's fill_value?

matrepr now supports a fill_value argument so doing either is easy. The question is which is better. A fill value shows the semantics of the array but hides the sparsity.

Here is one question: should empty cells remain empty, or display the array's fill_value?

matrepr now supports a fill_value argument so doing either is easy. The question is which is better. A fill value shows the semantics of the array but hides the sparsity.

If the fill value is indicated elsewhere I think it's better for them to be empty.

If the fill value is indicated elsewhere I think it's better for them to be empty.

Sounds good, that's the behavior in the PR. The fill value is in the summary line.