jamescourtney / FlatSharp

Fast, idiomatic C# implementation of Flatbuffers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Serializer.Parse stream API?

jetersen opened this issue · comments

Maybe I am using it wrong, but it feels like FlatSharp is missing a streaming API for parse.

I have some data that is highly compressible due to UTF-8 strings would like to gzip it.

using var inputStream = new MemoryStream(gzippedFlatbuffersSerialized);
using var gzipStream = new GZipStream(inputStream , CompressionMode.Decompress);
using var outputStream = new MemoryStream();
gzipStream.CopyTo(outputStream);
return ServiceFlatBuffers.Serializer.Parse(outputStream.ToArray(), FlatBufferDeserializationOption.Progressive);

You're correct that FlatSharp doesn't have a streaming API. The official FlatBuffer library doesn't have one either. Streaming is difficult with FlatBuffers for a few reasons:

  • FlatBuffers require random access. Many items are addressed by offsets, and jumps both forwards and backwards are possible. While some streaming abstractions do support seeking (FileStream, MemoryStream), compression streams do not last I checked. In that vein, writing FlatBuffers also requires jumping around in the buffer.
  • FlatBuffers are designed to be lazily parsed. This can mean starting from the root node many times.
  • FlatSharp has a fundamental need to get a Span<byte> from the input source (array/memory/etc).
  • While I haven't proven this, I suspect you would lose almost all of Flatbuffer's performance advantages with streams

You maybe could implement IInputBuffer yourself using a stream of choice (assuming it supports seeking), but you would end up doing lots of seeking and would need to worry about concurrency, since multiple threads could be fighting over the Position property. However, if streaming is a dealbreaker, my best advice to you is to use a serialization format that always goes left-to-right. I believe that Protobuf, MsgPack, and likely lots of others will fill this requirement for you.

The doesn't answer your compression question, but FlatBuffers lends itself very well to Memory Mapped files. I know this isn't streaming, but if all you need is File I/O, memory mapped files do offer many of the advantages of streaming since the OS manages how much is actually kept in memory at a time

I'm not sure if you have duplicate strings in your data or not, but FlatSharp does support string deduplication (see the shared strings sample). If you are encoding the same string multiple times, FlatSharp can deduplicate those in the output for you. The way this works is that strings are referenced by pointer (one of those random access cases), so for shared strings, FlatSharp will track all the places that need to point to a given string and write all those pointers at once.

You won't get close to the results that you might with gzip, but for cases where there are repeated strings, it can make a big difference in the output size.

@jamescourtney thanks for the detailed explanation. That helped me understand the use case for FlatBuffers better so much appreciated!
Hopefully others looking for streaming API can use this as a reference. It also helped me understand the different deserialization options.
Your right we are trying to compare various approaches to deserialization and comparing the benefits.
In our benchmarks I was using progressive which proved to be an unfair in the use case of comparison but for the application it might prove to be a good use case.
Greedy deserialization was right choice for comparison in benchmarks.

The input does not contain duplicates so deduplication would not help.

I'll close the issue.
Thanks for answering my question and providing excellent answers.