Performance, benchmarking, speed...

Question

Performance, benchmarking, speed...

dumblob opened this issue 3 years ago · comments

This project seems to get more or less mature. The whole point of existence of BipBuffer is performance. I wonder why there are no benchmarking results here.

Did anybody do any measurements against some highly performant ring buffers like SPSC http://daugaard.org/blog/writing-a-fast-and-versatile-spsc-ring-buffer--performance-results/ (which perform close to the speed of memcpy)?

If so, could you please point me to the measurement results?

If not, would you consider adding some basic, very approximate benchmarks?

James Munns · Answer 1 · Fri Sep 17 2021 10:39:03 GMT+0800 (China Standard Time)

So, benchmarking is something that is notoriously hard to get right, and is going to be very subjective to your use case.

That being said, when I have attempted to benchmark in the past, the numbers are pretty close to just measuring how fast your RAM/cache actually is.

From this tweet, I was able to obtain:

2.17GiB/s, 255 byte chunks w/ 16KiB buffer
18.00GiB/s, 8Kib chunks w/ 64KiB buffer

Assuming their ints are eight bytes, so they are transferring 8 GiB, I would expect BBQueue to be able to complete the task fairly quickly - though BBQueue works better in larger batches (e.g. 8 KiB chunks would be 1024 ints at a time, where they only transfer 1 or 16 at a time).

dumblob · Answer 2 · Fri Sep 17 2021 18:03:15 GMT+0800 (China Standard Time)

Thanks for the pointer. Yeah, I think it's important to benchmark from 1-byte chunks as that seems to be the smallest unit used for channels.

So I'd be interested in measurements for 2Bytes, 4Bytes, 8Bytes, 16Bytes, 32Bytes, 64Bytes, 128Bytes, 512Bytes, 2048Bytes etc. sizes of the whole BipBuffer between two os-level threads. Each measurement directly compared to memcpy() under the very same conditions (but in one thread of course 😉).

I'm not aiming for highly tuned benchmark with precisely defined conditions and semantics. It's just this relative comparison with memcpy() which is most interesting for me. Of course it doesn't say much about practicality (i.e. use in real apps), but it'll show something presumably quite close to a "best case" which is what I'm currently after.

James Munns · Answer 3 · Fri Sep 17 2021 19:50:15 GMT+0800 (China Standard Time)

Hey @dumblob, this is a personal project, so I'm not generally very actively trying to market it or convince folks to use it. I'd definitely encourage you to benchmark it for your needs if you are evaluating it, and would be happy to help find a home in the docs or repo to add your results if you are able and interested in turning them into a pull request!

dumblob · Answer 4 · Fri Sep 17 2021 21:00:27 GMT+0800 (China Standard Time)

Yup, I totally understand now.

I'm just briefly evaluating different options for inter-thread communication and am extremely tight on time budget, so it's quite probable, that I won't find any time to write such a mini benchmark framework (running it and adjusting existing one sounds viable, but creating it from scratch is a different question).

But I'm glad you're open to PRs!

James Munns · Answer 5 · Mon Dec 05 2022 03:02:12 GMT+0800 (China Standard Time)

Closing this, still open to benchmarking PRs, but it's unlikely I will write any unless requested by a client.

Thanks!

dumblob · Answer 6 · Tue Dec 06 2022 04:01:03 GMT+0800 (China Standard Time)

Yep, same goes for me - I did not find the time to do the next step. Maybe some time later. Thanks anyway!