Broker variant: immutable "data views" with monotonic buffer ressources

Question

Broker variant: immutable "data views" with monotonic buffer ressources

Neverlord opened this issue a year ago · comments

I've been thinking about performance recently. That's why I've been revisiting the radix-tree implementation for speeding up filter lookups.

But there's also room for improvement how we represent messages in memory. We've discussed memory-mappable layouts in the past. With a memory-mappable representation, we would basically read a message from the network and then simply create a sort of wrapper that decodes the bytes on demand. The downside is that creating a memory-mappable format is more complicated and requires dedicated builder APIs. While "deserializing" a value is very trivial, accessing fields in a memory-mapped data structure can come with some overhead since data must be decoded on the fly. We also would have to change our network format.

Instead of going down this road, I think there's also another option that doesn't require us to change the network format. With a monotonic buffer resource and a custom allocator, we can flatten nested data structures like broker::data in memory, reduce the number of heap allocations and skip any destructors (by "winking out" the entire data structure). This is the same technique that makes RapidJSON fast.

To quantify what kind of speedup we could get, I've implemented a small micro benchmark that uses regular broker::data and a new shallow_data implementation (not fully functional, just the types I've needed for the benchmark). I've picked shallow_data, because the original idea was that the data would also hold references into the bytes where we've deserialized from to avoid any unnecessary copying overhead. For the benchmark, it made little difference because we only have small strings.

I've picked something small to start with, so I've used a variable called event_1 with this content: (1, 1, (event_1, (42, test))). The benchmark currently only looks at how long it takes to deserialize the data:

----------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations
----------------------------------------------------------------------
broker_data/event_1                556 ns          556 ns      1231354
broker_data/shallow_event_1        185 ns          185 ns      3826998

It's a small data structure, so the runtime is fast either way. However, even for this very small data structure, we have a 3x speedup. Real-world messages will be larger and when doing thousands of these per second, the performance gain adds up quickly.

I would leave broker::data untouched and use the new "flattened" representation for the message types. Of course there'll be faster ways to do things in the new API. We might leave broker::data in for convenience or eventually fade it out. In the transition phase, I think we can make the API either backwards compatible by converting to "regular" broker::data where needed and otherwise keep the migration overhead minimal. We wouldn't touch the network format nor the JSON representation. We can also make this transparent to the Python bindings, if we don't remove them before that.