v2 - Generics
shawnsmithdev opened this issue · comments
If generics, when they are released, turn out to perform well, I'm going to rewrite this library to use them and get the line count way down.
This issue tracks the research into performance and, if it looks good, actual implementation.
Don't expect anything here until generics are actually released.
Played a bit with this in 1.17 but it will be easier to just wait for 1.18 beta as 1.17 test tooling breaks on generics syntax
I've spent the last few days working on this with go1.18beta1. Just for expediency I've deleted the tests and benchmarks and will be rewriting them as well to take advantage of generics. I played around with the built in fuzzing support, but it doesn't seem to work well with the kind of numeric slices this is meant to work with.
"detect" is a bit fiddling function that runs every sort that is used to determine type size without using reflection. So there's a unit test for correctness that also logs how fast the test was.
I'm getting rid of all the subpackages, as they just aren't needed anymore. Moving to generics is going to turn this ~2000 lines of copy pasta into ~500 lines of much better, cleaner code.
Now the bad news: Right now the only way I can see to do floats with generic code is to have two uintxx buffers. This is ... not ideal. But every other way I've tried to do it using the input as working space, as is done with integers and currently with floats, has been way too slow. Fixing this is my main problem right now.
=== RUN TestDetect
zermelo_test.go:35: uint: detect in 126ns
zermelo_test.go:35: uint8: detect in 77ns
zermelo_test.go:35: uint16: detect in 144ns
zermelo_test.go:35: uint32: detect in 54ns
zermelo_test.go:35: uint64: detect in 53ns
zermelo_test.go:35: uintptr: detect in 55ns
zermelo_test.go:35: int: detect in 67ns
zermelo_test.go:35: int8: detect in 127ns
zermelo_test.go:35: int16: detect in 78ns
zermelo_test.go:35: int32: detect in 67ns
zermelo_test.go:35: int64: detect in 62ns
--- PASS: TestDetect (0.00s)
=== RUN TestIntSorter
zsorter_test.go:84: []int: Won 100 in a row at size 77
--- PASS: TestIntSorter (0.00s)
=== RUN TestUintSorter
zsorter_test.go:84: []uint: Won 100 in a row at size 70
--- PASS: TestUintSorter (0.00s)
=== RUN TestInt64Sorter
zsorter_test.go:84: []int64: Won 100 in a row at size 78
--- PASS: TestInt64Sorter (0.00s)
=== RUN TestUint64Sorter
zsorter_test.go:84: []uint64: Won 100 in a row at size 79
--- PASS: TestUint64Sorter (0.00s)
=== RUN TestInt32Sorter
zsorter_test.go:84: []int32: Won 100 in a row at size 38
--- PASS: TestInt32Sorter (0.00s)
=== RUN TestUint32Sorter
zsorter_test.go:84: []uint32: Won 100 in a row at size 36
--- PASS: TestUint32Sorter (0.00s)
=== RUN TestInt16Sorter
zsorter_test.go:84: []int16: Won 100 in a row at size 15
--- PASS: TestInt16Sorter (0.00s)
=== RUN TestUint16Sorter
zsorter_test.go:84: []uint16: Won 100 in a row at size 19
--- PASS: TestUint16Sorter (0.00s)
=== RUN TestInt8Sorter
zsorter_test.go:84: []int8: Won 100 in a row at size 8
--- PASS: TestInt8Sorter (0.00s)
=== RUN TestUint8Sorter
zsorter_test.go:84: []uint8: Won 100 in a row at size 9
--- PASS: TestUint8Sorter (0.00s)
=== RUN TestUintptrSorter
zsorter_test.go:84: []uintptr: Won 100 in a row at size 78
--- PASS: TestUintptrSorter (0.00s)
=== RUN TestFloat32Sorter
zsorter_test.go:121: []float32: Won 100 in a row at size 41
--- PASS: TestFloat32Sorter (0.00s)
=== RUN TestFloat64Sorter
zsorter_test.go:121: []float64: Won 100 in a row at size 82
--- PASS: TestFloat64Sorter (0.01s)
=== RUN TestNaNs
--- PASS: TestNaNs (0.00s)
PASS
ok github.com/shawnsmithdev/zermelo 0.025s
Breaking changes are removing subpackages and changes to Sorter
Here's a rough draft of the new upper level exported API.
// Numerical is a constraint that permits any integer or floating point type
type Numerical interface {
constraints.Integer | constraints.Float
}
// Sort attempts to sort x.
//
// If x is a supported slice type, this library will be used to sort it. Otherwise,
// if x implements sort.Interface it will passthrough to the sort.Sort() algorithm.
// Returns an error on unsupported types.
func Sort(x any) error
func SortIntegers[T constraints.Integer](x []T)
func SortIntegersBYOB[T constraints.Integer](x, buffer []T)
func SortFloats[T constraints.Float](x []T)
// hopefully I can change this one
func SortFloatsBYOB[F constraints.Float, I constraints.Unsigned](x []F, ybuf, zbuf []I)
// Sorter can sort slices
type Sorter[T Numerical] interface {
// Sort sorts x
Sort(x []T)
}
func NewIntSorter[I constraints.Integer]() Sorter[I]
func NewFloatSorter[F constraints.Float, U constraints.Unsigned]() Sorter[F]
I've managed to get rid of the extra buffer and Unsigned
stuff in the exported API, but at what cost?
package zermelo
import (
"constraints"
"reflect"
"runtime"
"unsafe"
)
// unsafeFlipSortFlip converts float slices to unsigned, flips some bits to allow sorting, sorts and unflips.
// F and U must be exactly the same size, and len(buf) must be >= len(x)
// This will not work if NaNs are present in x. Remove them first.
func unsafeFlipSortFlip[F constraints.Float, U constraints.Unsigned](x, buf []F, size uint) {
y := unsafeSliceConvert[F, U](x)
z := unsafeSliceConvert[F, U](buf)
// flip
allBits := ^U(0)
topBit := U(1) << (size - 1)
topBit = topBit ^ (topBit << 1)
for idx, val := range y {
if val&topBit == topBit {
y[idx] = val ^ allBits
} else {
y[idx] = val ^ topBit
}
}
// sort
SortIntegersBYOB(y, z)
// this may be needed to keep buffer out of gc while in use
runtime.KeepAlive(buf)
// unflip
for idx, val := range y {
if val&topBit == topBit {
y[idx] = val ^ topBit
} else {
y[idx] = val ^ allBits
}
}
}
// unsafeSliceConvert takes a slice of one type and returns a slice
// of another type using the same memory for the backing array.
// If x goes out of scope, the returned slice becomes invalid.
// A and B absolutely must be exactly the same size!
func unsafeSliceConvert[A, B Numerical](x []A) []B {
var result []B
xHeader := (*reflect.SliceHeader)(unsafe.Pointer(&x))
resultHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result))
resultHeader.Data = xHeader.Data
resultHeader.Len = xHeader.Len
resultHeader.Cap = xHeader.Cap
return result
}
Thanks for this! I have a more-or-less immediate need for the generic version of this library. Do you have a branch I could use? (Specifically, just for integers.)
let me iron out the tests, and yes I can do that, sorry it took me so long to notice this!
One of the things that was hanging me up is:
https://pkg.go.dev/golang.org/x/exp/slices#Sort
is just quite a bit faster than the old stdlib sort, and it was making me think the new zermelo code was super slow. But the bar is just raised now. I may adjust the constant for when it switches to radix sort to be 384 or 512 instead of 256...
@sbromberger let me know if that helps you out or if you find any problems.