pydata / sparse

Sparse multi-dimensional arrays for the PyData ecosystem

Home Page:https://sparse.pydata.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Xarray/Numpy/Dask Indexing using a Sparse Array

philipc2 opened this issue · comments

Description
I'm coming from the UXarray package, and I am looking into using sparse to represent integer connectivity arrays.

Some content, UXarray provides functionality for working with Unstructured Grids, with many datasets being flexible meshes, meaning that each face may have a variable number of nodes.

Because of this, it makes a lot of sense to store our connectivity as a sparse array. One example is the face_node_connectivity, which contains the indices of the nodes that make up each face. Currently, we store this as a 2D dense array, with each row containing the node indicies. For any face that has less than the max number of nodes, the remaining entries in the array are padded using a fill value.

I'd like to be able to use a sparse array to index a 1D array of values, transforming each index in the sparse array into a data variable.

# Face nodes as a sparse array
face_nodes_sparse = GCXS.from_numpy(face_node_conn, fill_value=INT_FILL_VALUE)

# some N-D node centered variable (could be ['time', 'layer', 'n_node'] for example 
data = xr.DataArray(..., )

# indexing using the sparse array as indices, result will be the same shape as the sparse face nodes
face_node_data = data.isel(n_node=face_nodes_sparse)

Is there any recommended approach or existing functionality that would allow for this?

Usage

I'd like to perform operations such as mean, max, min, etc. on the resulting sparse data array

# mean of the nodes that make up each face
face_node_data.mean()

We currently has this implementation in UXarray, which uses the dense array and a for loop, which is not ideal.

Any feedback or guidance would be great!

Pinging @dcherian, since he's familiar with both UXarray & Sparse

In its current form, COO is better supported than GCXS, and all operations you describe are supported.

In its current form, COO is better supported than GCXS, and all operations you describe are supported.

Great! Let me look into it and post an update. Thank you for the quick response?

We'd be interested to know more about your use-case; so please feel free to comment here with any roadblocks you run into and we'd be happy to address them as best we can. Performance isn't ideal right now, but we're working on it; see #618.

I appreciate the help!

The connectivity that we use (face_node_connectivity) has the following properties

  • Each face has n nodes, up to n_max_face_nodes, with the minimum number being 3 (a triangle)
  • We know ahead of time how many nodes each face has, stored in a 1D array n_nodes_per_face
  • Faces with less than n_max_face_nodes is padded from to fill the dense array

I'd like to have a sparse array that can be indexed row-wise to access the non-fill value entries (sparse_conn[0])

I've been able to successfully create a COO and GCXS from the dense array with fill values, but I'm not certain how to use it to index my Data variable

To be clear do you want something like

xr.DataArray(dims=..., 
	data=sparse.COO(
			coords=selector.coords, data=data_array.data.isel(...=selector.data)
	)
)

If so, can you open an issue over at Xarray with very minimal example please?

Closing since it seems the help needed was provided.