Implement and run a "serious" concurrent hash table benchmarking suite

Question

Implement and run a "serious" concurrent hash table benchmarking suite

jonhoo opened this issue 4 years ago · comments

It is easy to write simple benchmarks for concurrent hash tables. But when these data-structures hit real world data is when we learn how they truly perform. There has been much research on concurrent hash tables, and how to benchmark them, so let's build a benchmark (or set of benchmarks rather) inspired by that work! We can then run that benchmark against the various concurrent map implementations out there in Rust world, and (hopefully) get some useful data out of them.

I'm hoping this thread can act as a staging ground for designing this benchmark, and that it can then be forked off into its own stand-alone project.

Work of note to get started (please let me know if you know of others):

https://www.aimlab.org/haochen/papers/fgcs18-hash.pdf (§3.3)
https://arxiv.org/pdf/1601.04017.pdf (§8.3, and §8.4 "Mixed Insertions and Finds" in particular)
The libcuckoo universal benchmark
https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf (§6.1)
redis-benchmark

/cc @xacrimon

Joel Wejdenstål · Answer 1 · Sat Jan 25 2020 18:20:13 GMT+0800 (China Standard Time)

Ah, I would love to help but I am a tad busy right now.

Jon Gjengset · Answer 2 · Sat Jan 25 2020 22:28:14 GMT+0800 (China Standard Time)

@xacrimon I CC'ed you mostly so that you could subscribe to this issue if you were interested, not in expectation that you'd necessarily do any work :)

Jon Gjengset · Answer 3 · Wed Feb 26 2020 23:03:24 GMT+0800 (China Standard Time)

Having dug into this a bit more now, I think the libcuckoo's universal benchmark is the right starting point. It does most of what we want, and the various papers add relatively little to what the benchmark already does. The one thing it is missing is concurrent access to the same key: each thread generates its own random key sequence, and in the 64-bit space, there are unlikely to be any overlaps. We'll still see contention on buckets, which is arguably most important, but it'd be good to also measure same-key contention. But that I think we can leave for later.

Jon Gjengset · Answer 4 · Thu Feb 27 2020 02:39:59 GMT+0800 (China Standard Time)

I wrote a first draft of a more "serious" concurrent benchmark: https://github.com/jonhoo/bustle