numbagg / numbagg

Fast N-dimensional aggregation functions with Numba

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

block_reduce type functions

rabernat opened this issue · comments

I'm in need of a gufunc that does something like scikit-image's block_reduce function, but in n-dimensions.

As a simple example, I want something like this

>>> x = np.ones(8)
>>> block_reduce(x, 2, how='sum')
[2, 2, 2, 2]

I would like to generalize this to ndims, have various options for reduction, and also possibly provide weights.

Maybe numbagg already does this? If not, is it in scope?

I think this would be in scope for numbagg.

sckit-image's block_reduce does work in n-dimensions, e.g., block_reduce(np.ones((5, 2, 2)), (1, 2, 2), how='sum').shape == (5, 1, 1). But I agree that it has limitations, e.g., you have to pad the array if the shape is not exactly divided by the blocks.

A more flexible version might let you provide block IDs along each axis, sort of a multi-dimensional version of group_nanmean(). It might be a little tricky to squeeze into a gufunc since you would need a variable number of labels, depending on the number of axes, e.g.,

@guvectorize(
    [(float64[:, :], int64[:], int64[:], float64[:, :])],
    signature='(i,j),(i),(j)->(k,m)',
)
def block_nanmean_2d(values, labels_x, labels_y, out):
    ...

This function could probably be built dynamically for an arbitrary number of label dimensions, but it would be a bit of a pain.

Maybe there's something clever you could do to avoid this. E.g., if you're willing to allocate a full array of label indices for each point, you could make the signature something like (i,j),(i,j,2)->(k,m).