The NaN problem

Question

The NaN problem

shawnsmithdev opened this issue 2 years ago · comments

One of the defining characteristics of zermelo is that it supports constraints.Float types, likely a rarity for radix sort libraries given the bit-flipping and type-casting shenanigans you have to do to make it happen. Which means we need to deal with NaN values.

NaNs are lots of fun. By their own admission, they are not numbers. You can't sort them because you can't even compare them, much less consider their digits by radix. You can only have a policy on what to do with them when present.

zermelo currently does a whole linear scan through the slice before any other action, looking for NaNs and setting them up front, ahead of all other elements. This only happens in the constraints.Float code. It then does its flip-sort-flip magic on the remainder of the slice. This behavior was chosen as it is also how sort.Float64s() handles NaNs.

But comes now slices.Sort(), the new generic comparison sort in golang.org/x/exp. I fully expect constraints and slices and map to show up in the stdlib, probably in go1.19 if nothing goes wrong. We shall see, but that is my assumption. I do know it is much faster than the sort package, likely due to less function pointer dereferencing and more inlining, and probably also just a better comparison sort implementation.

I'm not quite sure if there is a defined behavior at all for NaNs in slices.Sort, but it seems unlikely given is a generic comparison sort on constraints.Ordered. What I am sure of is that it isn't the same as sort.

Which is all a long way of saying we need an official policy, documented and tested, about what is done with NaNs.

Shawn Smith · Answer 1 · Sat May 07 2022 05:13:08 GMT+0800 (China Standard Time)

I've decided to continue to put NaNs up front as that was the behavior before, it is fast and linear speed. v1.5.2 now includes tests for this. I will want to mention it in the actual README later when I do a better update to highlight the generics code.

Shawn Smith · Answer 2 · Wed May 18 2022 10:06:21 GMT+0800 (China Standard Time)

This is basically resolved, improving documentation will probably happen in the 2.0 release and backported later