Workiva / go-datastructures

A collection of useful, performant, and threadsafe Go datastructures.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Do you need generics, or not

anlhord opened this issue · comments

This I've noticed that you use interface{} (EFACE-not ideal due to run time type checking) and also regular interfaces (IFACE-slow dispatch).

I've hacked the gccgo to add basic generic (1-parametric polymorphic functions) to go.
If my proposal is completed, your data structures will be fully performant, work on any type and there will be NO code bloat (NO hidden code generation for different types like the go generate etc).

The go people said they will NOT work on adding generics to go.

Problem is due to ideological issues I'm afraid it probably won't be merged to the go language. This means if I finish this, and will be rejected, will need to fork go language (3 compilers) and compete with go language. So i will probably throw away the whole project. I will soon quit go

Side note: please look how generic algorithm e.g. arbitrary slice sort can be implemented by breaking the type system "escaping the google walled garden" using unsafe: https://github.com/gomacro/sort

If this is just a toy project, excuse me and, have fun.

Anlhord, I use interface{} when no methods are required of the object (queues, sets, etc) and an interface when I'm trying to make an ordered generic data structure, and usually that interface has a single method like Compare.

I've noticed a 5x performance hit (skiplist and B-trees are good cases) by using an interface as opposed to a more primitive type. You'll notice in the skiplist case that fetching an item by a position is quite a bit faster than by value even though the algorithm is nearly identical, and that difference is the fact that by position gets to compare uint64s while the interface has Compare called on it for every search. The difference is not minor. This is sad in the fact that there are a lot of cases where I may want a B-tree or skiplist to store ints or uints with optimal performance, but have to wrap them in an interface first.

In nearly all my profiling now I see the interface methods pop up near the top of the list. I haven't traced the overhead completely, but I assume it involves two factors:

  1. Calling an interface method involves I2T calls in the runtime. Haven't looked to see what is involved with this method, but the overhead is certainly there.
  2. It really hurts when I'm trying to achieve better memory locality to reduce cache misses. By accepting interfaces, I've nearly guaranteed two cache misses for every comparison I believe, one to grab the object that's holding the concrete implementation's type and a pointer to the implementation and then another visit to actually grab the implementation. I may be wrong on this, but I believe cache thrashing is causing some of the overhead with interfaces.

For anyone using these data structures, if you need absolute performance I would recommend copying/pasting the implementations and using your specific type.

If I could get generics in Go I would be forever grateful to you. I really think performance here could be greatly improved and a lot of code could be simplified.

As for myself, I use a lot of these datastructures in my own (and my company's) projects but I'm really tired of knowing that performance is not all it could be. I'm finishing up the concurrent B-tree (PALM) and then I think I'm going to start porting some of these over to Rust to see what the differential is. If it is significant, I'll probably end up porting the whole repo and dedicate myself to that language.

Given the positive reply, let me show you what I currently have:
http://our-gol-842.appspot.com/p/ePsSIabncM

This is a linked list demo:
http://our-gol-842.appspot.com/p/5MMZpov8UH

I don't yet have a working generic slice , e.g. a [] type object.

Use copy and paste?

@anlhord A go generate tool could be used to output each algorithm in this repository optimized for the specified type being targeted which would alleviate the performance issues described by @dustinhiatt-wf. Of course you could copy and paste the code and change it to your needs, but then any time the algorithm is further changed (bug fixes, optimizations), you could have to manually make the changes.

That being said I do not know how much work it would require to write this tool to accommodate various types for each algorithm. Nonetheless I wanted to pass along the idea.

commented

@bruth you might want to checkout gen. It is a tool for this job.

Closing as stale; the discussion seems to have moved elsewhere.