Broker variant: immutable "data views" with monotonic buffer ressources
Neverlord opened this issue · comments
I've been thinking about performance recently. That's why I've been revisiting the radix-tree implementation for speeding up filter lookups.
But there's also room for improvement how we represent messages in memory. We've discussed memory-mappable layouts in the past. With a memory-mappable representation, we would basically read a message from the network and then simply create a sort of wrapper that decodes the bytes on demand. The downside is that creating a memory-mappable format is more complicated and requires dedicated builder APIs. While "deserializing" a value is very trivial, accessing fields in a memory-mapped data structure can come with some overhead since data must be decoded on the fly. We also would have to change our network format.
Instead of going down this road, I think there's also another option that doesn't require us to change the network format. With a monotonic buffer resource and a custom allocator, we can flatten nested data structures like broker::data
in memory, reduce the number of heap allocations and skip any destructors (by "winking out" the entire data structure). This is the same technique that makes RapidJSON fast.
To quantify what kind of speedup we could get, I've implemented a small micro benchmark that uses regular broker::data
and a new shallow_data
implementation (not fully functional, just the types I've needed for the benchmark). I've picked shallow_data
, because the original idea was that the data would also hold references into the bytes where we've deserialized from to avoid any unnecessary copying overhead. For the benchmark, it made little difference because we only have small strings.
I've picked something small to start with, so I've used a variable called event_1
with this content: (1, 1, (event_1, (42, test)))
. The benchmark currently only looks at how long it takes to deserialize the data:
----------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------
broker_data/event_1 556 ns 556 ns 1231354
broker_data/shallow_event_1 185 ns 185 ns 3826998
It's a small data structure, so the runtime is fast either way. However, even for this very small data structure, we have a 3x speedup. Real-world messages will be larger and when doing thousands of these per second, the performance gain adds up quickly.
I would leave broker::data
untouched and use the new "flattened" representation for the message types. Of course there'll be faster ways to do things in the new API. We might leave broker::data
in for convenience or eventually fade it out. In the transition phase, I think we can make the API either backwards compatible by converting to "regular" broker::data
where needed and otherwise keep the migration overhead minimal. We wouldn't touch the network format nor the JSON representation. We can also make this transparent to the Python bindings, if we don't remove them before that.