Performance plans?

Question

Performance plans?

fhunleth opened this issue 4 years ago · comments

This isn't an issue. I just wasn't sure how to best to contact you.

On Nerves, we have a library, RingLogger, that uses :queue internally (like this one) for a circular buffer. It's used quite a bit and we've seeing performance issues - mostly around the amount of garbage it creates. I started looking into replacing :queue with an implementation that optimizes for operations we actually do.

Your library is nice in that it is well-tested and has an API that doesn't expose functions that would break the optimizations that I'm making.

I'm wondering if rather than only update RingLogger if it would be better for me to send PRs over here and depend on circular_buffer.

This, of course, depends on whether you're interested in this, have time, and also whether my optimizations don't negatively impact your use cases.

Here's a preview of the direction: https://github.com/fhunleth/circular_buffer/blob/ring/lib/circular_buffer2.ex.

Simplistic benchmarks are currently showing the alternative circular_buffer implementation using <60% the memory of the :queue implementation and running the same set of inserts in 50-60% of the time. (see commits for benchmark outputs) I still need to validate this on a production device, but this feels like it will be a meaningful improvement for Nerves.

Chris Keathley · Answer 1 · Sun May 24 2020 22:55:16 GMT+0800 (China Standard Time)

Hey @fhunleth! I'm happy to support this library, especially if its useful to Nerves. Performance is definitely something I'd like to improve. I'll take a look at your implementation 👍 I'm not too worried about any changes messing with my specific uses. I'm only using this library in https://github.com/keathley/orwell, and I've needed to look at tuning it for a while.

Frank Hunleth · Answer 2 · Sun May 24 2020 23:21:36 GMT+0800 (China Standard Time)

Thanks!

I'm currently running the updates on a device to get some more realistic data than what I'm looking at with Benchee. Will see what happens.

Frank Hunleth · Answer 3 · Wed May 27 2020 22:32:09 GMT+0800 (China Standard Time)

Update from watching this on a production device:

Any savings from doing fewer memory allocations or reductions is obscured by other code running in the same process.
GC is further obscuring my measurements.
I probably have more questions now than before I did the tests, but nothing suggests that the changes make things worse.

I also dug deeper into the :queue implementation and Okasaki's Functional Data Structures dissertation. What I was doing is basically a simplistic implementation of :queue. I feel that it has value since 1. the main steady-state operation of inserting a new item and deleting the oldest is so simple, 2. there's an opportunity to avoid a list concat for reduce in the future, and 3. it does work better in microbenchmarks. Having studied :queue, I don't understand it could ever perform worse than :queue since it's doing the exact same internal list management. I wish that would have shown up in my device tests, though.

Having read the source to :queue and Okasaki, I'd like to make the words I use consistent with theirs - probably only rename my a and b lists to front and rear. And then send a PR over.

Brian Glusman · Answer 4 · Sun Jun 14 2020 05:59:30 GMT+0800 (China Standard Time)

Possibly unrelated topic, but I think kind of relevant when discussing different performance/trade-off implementations of a fairly general API like circular_buffer...

I'm relying on circular_buffer at the moment in my not-yet-hex-released library https://github.com/bglusman/live_dashboard_history (which, by the way, would love to get new hex release out of circular_buffer as I'm relying on the bug-fix in master, but wouldn't be able to hex release until that's fixed I think, at least on some elixir versions) but I was thinking what I'd like to do is wrap circular_buffer in a behavior and default to using circular_buffer as the implementation for my library, but allow config to override that and swap in an ets based solution, or even redis or something, whatever someone implements. It'd be even better though I think if that behavior were baked into something shared, outside my library, either in circular_buffer, or perhaps in a minimal dependency that mostly just defined behaviors and no implementations? Curious for either/both of your thoughts on best approach and right place for this, can also make a new issue, of course, but, it's more of a notion at the moment.

Chris Keathley · Answer 5 · Sun Jun 14 2020 22:25:52 GMT+0800 (China Standard Time)

I've considered backing circular_buffer with ets for a while. That's probably the highest throughput way to handle this. But I'm not sure if that would work for @fhunleth's use case or not.

Frank Hunleth · Answer 6 · Sun Jun 14 2020 23:49:34 GMT+0800 (China Standard Time)

ets was the slowest option in my microbenchmarks. That's not that surprising, though, since my use case and therefore my benchmarks are all single process.

This is currently the fastest option for me: https://github.com/fhunleth/circular_buffer/blob/ring/lib/circular_buffer2.ex.

It's basically the same as Erlang's :queue except that it takes advantage of the knowledge that it's a circular buffer to simplify insert/2.

I've been a little quiet in posting updates because 1. I got a busy at work, and 2. while this is a demonstrable improvement, I haven't fixed the root cause of excess memory use/garbage creation and was focused on understanding other areas as time permitted.

I don't want to hold up any work. I'm accustomed to Nerves uses cases being slightly different than backend ones, so if ets is an improvement for your use cases, then please go that direction.

Brian Glusman · Answer 7 · Sun Jun 14 2020 23:57:46 GMT+0800 (China Standard Time)

Apologies, I may have confused things, like I said maybe should have been a separate issue but I was just thinking about how best to abstract the implementation from the interface and possibly still leverage the same test coverage for multiple implementations at once... specifically for circular_buffer but also wondering as a general question/direction for relatively general data tools like this that have an arbitrary number of possible implementations each with different tradeoffs. Behaviors are obviously a good way to handle in general but the boundaries across packages seem like something I haven’t necessarily seen great patterns for making modular and flexible as dependencies and configurable options.

Brian Glusman · Answer 8 · Mon Jun 15 2020 00:39:17 GMT+0800 (China Standard Time)

Here’s an idea... what if circular buffer had a behavior both in the implementation and also in the tests, and defaulted to using the library’s implementation in both but we documented a way for any other implementation to to run its tests against their implementation of the behavior, to ensure it passes without copy pasting all the tests etc? Then one library can both be reference implementation and source of truth on correctness of behavior? And a second and/or third implementation optimized for different use cases can be conformant and easy to swap in with confidence it works the same but with different tradeoffs? That’s really what I want for my live_dashboard_history and seems likely a fair number of other libraries/apps out there might like for this or other general purpose data structure libraries, so maybe if it works we can start a trend.

Frank Hunleth · Answer 9 · Tue Jun 16 2020 20:49:46 GMT+0800 (China Standard Time)

@bglusman https://github.com/ckampfe/cbuf went with the behaviour implementation.

I'd personally rather stick with a simple one-file implementation.

Brian Glusman · Answer 10 · Tue Jun 16 2020 21:30:55 GMT+0800 (China Standard Time)

Oh cool, maybe I should switch to that and stop bothering you guys then 🤣😉 though I did start playing with a fork of this one and whether/how to share tests across implementations, maybe I’ll see if I can hybridize the two into a new buffer with tests on every implementation and that allows extensions to share those tests via config, if I get motivated.

…

On Tue, Jun 16, 2020 at 8:50 AM Frank Hunleth ***@***.***> wrote: @bglusman <https://github.com/bglusman> https://github.com/ckampfe/cbuf went with the behaviour implementation. I'd personally rather stick with a simple one-file implementation. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAZXZCOWVHPGYSF7VAEAWLRW5S7RANCNFSM4NIRTLMQ> .

Chris Keathley · Answer 11 · Tue Jun 16 2020 21:52:49 GMT+0800 (China Standard Time)

I'm not sure that you really want to choose the backing implementation. I'd rather just have one, optimized solution that fits most cases.

Brian Glusman · Answer 12 · Thu Jun 18 2020 02:36:25 GMT+0800 (China Standard Time)

yeah in most cases I think I agree, but for this use case with live dashboard I'd like someone to be able to swap in redis or something if they wanted to store a much larger amount of history or something, so idea is default will be one implementation but can provide any module that implements behavior and it should work... I suppose technically you don't even need a behavior for that, but, seems like a good idea. dunno, I could change my mind, we'll see.

Frank Hunleth · Answer 13 · Sun May 30 2021 06:03:27 GMT+0800 (China Standard Time)

Closed by #4.