shawnsmithdev / zermelo

A radix sorting library for Go (golang)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

v2 - Generics

shawnsmithdev opened this issue · comments

If generics, when they are released, turn out to perform well, I'm going to rewrite this library to use them and get the line count way down.

This issue tracks the research into performance and, if it looks good, actual implementation.

Don't expect anything here until generics are actually released.

Played a bit with this in 1.17 but it will be easier to just wait for 1.18 beta as 1.17 test tooling breaks on generics syntax

I've spent the last few days working on this with go1.18beta1. Just for expediency I've deleted the tests and benchmarks and will be rewriting them as well to take advantage of generics. I played around with the built in fuzzing support, but it doesn't seem to work well with the kind of numeric slices this is meant to work with.

"detect" is a bit fiddling function that runs every sort that is used to determine type size without using reflection. So there's a unit test for correctness that also logs how fast the test was.

I'm getting rid of all the subpackages, as they just aren't needed anymore. Moving to generics is going to turn this ~2000 lines of copy pasta into ~500 lines of much better, cleaner code.

Now the bad news: Right now the only way I can see to do floats with generic code is to have two uintxx buffers. This is ... not ideal. But every other way I've tried to do it using the input as working space, as is done with integers and currently with floats, has been way too slow. Fixing this is my main problem right now.

=== RUN   TestDetect
    zermelo_test.go:35: uint: detect in 126ns
    zermelo_test.go:35: uint8: detect in 77ns
    zermelo_test.go:35: uint16: detect in 144ns
    zermelo_test.go:35: uint32: detect in 54ns
    zermelo_test.go:35: uint64: detect in 53ns
    zermelo_test.go:35: uintptr: detect in 55ns
    zermelo_test.go:35: int: detect in 67ns
    zermelo_test.go:35: int8: detect in 127ns
    zermelo_test.go:35: int16: detect in 78ns
    zermelo_test.go:35: int32: detect in 67ns
    zermelo_test.go:35: int64: detect in 62ns
--- PASS: TestDetect (0.00s)
=== RUN   TestIntSorter
    zsorter_test.go:84: []int: Won 100 in a row at size 77
--- PASS: TestIntSorter (0.00s)
=== RUN   TestUintSorter
    zsorter_test.go:84: []uint: Won 100 in a row at size 70
--- PASS: TestUintSorter (0.00s)
=== RUN   TestInt64Sorter
    zsorter_test.go:84: []int64: Won 100 in a row at size 78
--- PASS: TestInt64Sorter (0.00s)
=== RUN   TestUint64Sorter
    zsorter_test.go:84: []uint64: Won 100 in a row at size 79
--- PASS: TestUint64Sorter (0.00s)
=== RUN   TestInt32Sorter
    zsorter_test.go:84: []int32: Won 100 in a row at size 38
--- PASS: TestInt32Sorter (0.00s)
=== RUN   TestUint32Sorter
    zsorter_test.go:84: []uint32: Won 100 in a row at size 36
--- PASS: TestUint32Sorter (0.00s)
=== RUN   TestInt16Sorter
    zsorter_test.go:84: []int16: Won 100 in a row at size 15
--- PASS: TestInt16Sorter (0.00s)
=== RUN   TestUint16Sorter
    zsorter_test.go:84: []uint16: Won 100 in a row at size 19
--- PASS: TestUint16Sorter (0.00s)
=== RUN   TestInt8Sorter
    zsorter_test.go:84: []int8: Won 100 in a row at size 8
--- PASS: TestInt8Sorter (0.00s)
=== RUN   TestUint8Sorter
    zsorter_test.go:84: []uint8: Won 100 in a row at size 9
--- PASS: TestUint8Sorter (0.00s)
=== RUN   TestUintptrSorter
    zsorter_test.go:84: []uintptr: Won 100 in a row at size 78
--- PASS: TestUintptrSorter (0.00s)
=== RUN   TestFloat32Sorter
    zsorter_test.go:121: []float32: Won 100 in a row at size 41
--- PASS: TestFloat32Sorter (0.00s)
=== RUN   TestFloat64Sorter
    zsorter_test.go:121: []float64: Won 100 in a row at size 82
--- PASS: TestFloat64Sorter (0.01s)
=== RUN   TestNaNs
--- PASS: TestNaNs (0.00s)
PASS
ok  	github.com/shawnsmithdev/zermelo	0.025s

Breaking changes are removing subpackages and changes to Sorter Here's a rough draft of the new upper level exported API.

// Numerical is a constraint that permits any integer or floating point type
type Numerical interface {
	constraints.Integer | constraints.Float
}

// Sort attempts to sort x.
//
// If x is a supported slice type, this library will be used to sort it. Otherwise,
// if x implements sort.Interface it will passthrough to the sort.Sort() algorithm.
// Returns an error on unsupported types.
func Sort(x any) error

func SortIntegers[T constraints.Integer](x []T)

func SortIntegersBYOB[T constraints.Integer](x, buffer []T)

func SortFloats[T constraints.Float](x []T)

// hopefully I can change this one
func SortFloatsBYOB[F constraints.Float, I constraints.Unsigned](x []F, ybuf, zbuf []I)

// Sorter can sort slices
type Sorter[T Numerical] interface {
	// Sort sorts x
	Sort(x []T)
}

func NewIntSorter[I constraints.Integer]() Sorter[I]

func NewFloatSorter[F constraints.Float, U constraints.Unsigned]() Sorter[F]

I've managed to get rid of the extra buffer and Unsigned stuff in the exported API, but at what cost?

package zermelo

import (
	"constraints"
	"reflect"
	"runtime"
	"unsafe"
)

// unsafeFlipSortFlip converts float slices to unsigned, flips some bits to allow sorting, sorts and unflips.
// F and U must be exactly the same size, and len(buf) must be >= len(x)
// This will not work if NaNs are present in x. Remove them first.
func unsafeFlipSortFlip[F constraints.Float, U constraints.Unsigned](x, buf []F, size uint) {
	y := unsafeSliceConvert[F, U](x)
	z := unsafeSliceConvert[F, U](buf)

	// flip
	allBits := ^U(0)
	topBit := U(1) << (size - 1)
	topBit = topBit ^ (topBit << 1)

	for idx, val := range y {
		if val&topBit == topBit {
			y[idx] = val ^ allBits
		} else {
			y[idx] = val ^ topBit
		}
	}

	// sort
	SortIntegersBYOB(y, z)

	// this may be needed to keep buffer out of gc while in use
	runtime.KeepAlive(buf)

	// unflip
	for idx, val := range y {
		if val&topBit == topBit {
			y[idx] = val ^ topBit
		} else {
			y[idx] = val ^ allBits
		}
	}
}

// unsafeSliceConvert takes a slice of one type and returns a slice
// of another type using the same memory for the backing array.
// If x goes out of scope, the returned slice becomes invalid.
// A and B absolutely must be exactly the same size!
func unsafeSliceConvert[A, B Numerical](x []A) []B {
	var result []B
	xHeader := (*reflect.SliceHeader)(unsafe.Pointer(&x))
	resultHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result))
	resultHeader.Data = xHeader.Data
	resultHeader.Len = xHeader.Len
	resultHeader.Cap = xHeader.Cap
	return result
}

Thanks for this! I have a more-or-less immediate need for the generic version of this library. Do you have a branch I could use? (Specifically, just for integers.)

let me iron out the tests, and yes I can do that, sorry it took me so long to notice this!

One of the things that was hanging me up is:

https://pkg.go.dev/golang.org/x/exp/slices#Sort

is just quite a bit faster than the old stdlib sort, and it was making me think the new zermelo code was super slow. But the bar is just raised now. I may adjust the constant for when it switches to radix sort to be 384 or 512 instead of 256...

@sbromberger let me know if that helps you out or if you find any problems.