esheldon / esutil

A variety of python utilities focusing on numerical, scientific, and astrophysical computing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reverse indices are not properly sorted (and can change from version to version)

erykoff opened this issue · comments

The following code snippet (a) doesn't return sorted indices and (b) results in different ordering from numpy 1.24 to numpy 1.26 on x86 processors.

import numpy as np
import esutil


arr = np.zeros(1000, dtype=np.int64)
arr[500: ] = 1

h, rev = esutil.stat.histogram(arr, rev=True)

print(rev[rev[0]: rev[1]])
print(rev[rev[1]: rev[2]])

The fix is that the histogram call to argsort should use kind="stable" which properly preserves sort order.

I don't want to fix this yet to allow my code to properly prepare for the different (but correct!) ordering.

The docs say the underlying implementation will depend on data type, so we should make sure we get good coverage in the unit tests

@erykoff are you ready for a fix for this?

I haven't done anything about this because I didn't want to deal with writing the tests. 😞

Reading the docs https://numpy.org/doc/stable/reference/generated/numpy.sort.html, it seems that the mention of datatype for stable is whether timsort is used (I think for floats, etc) or radix sort (for ints, strings, etc) is used. But the key is that it is guaranteed to maintain the input order (that is the stability). Not to say testing isn't good! But that's an implementation detail. I think it's important that this actually return stable outputs.

I'm happy to have the change when you are ready