dmonad / crdt-benchmarks

A collection of CRDT benchmarks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Benchmarking RON-based chronofold implementation

gritzko opened this issue · comments

Hi!

Martin @ept pointed me at this benchmark. Thanks, Martin!

Currently I have a rather unpolished RON+Chronofold C++ implementation which does text versioning/synchronization. It works in a slightly different model than Y.js or Automerge (see the draft at https://arxiv.org/abs/2002.09511). In particular, appending a char to a text would take nanoseconds, there is no point in comparing that case.

I thought, this microbenchmark should be close to the worst case for the C++ code:
[B1.4] Insert N characters at random positions (time)

That is because inserting-at-the-index is an expensive operation for RON+CF. It does not keep the text as a string, it has to recalculate such things. Also, inserting single characters at random positions breaks all chronofold optimizations. To make things comparable, I used the LineBasedText wrapper that uses {line,col} addressing.
Long story short, the first run of the bench said it spends ~3800ns per iteration (I use google/benchmark). That is a disappointing number... It is roughly ~6000*4usec for N=6000 iterations or 24ms in the "[B1.4]...(time)" line of the table.

Although, it is not directly comparable as it does not rebuild the entire text on each edit; only the affected line is read back. (That is the key idea of the LBT wrapper, it is made for text editors.)

I think I can measure the size of the resulting frame too...

Hi @gritzko,

thanks for sharing your benchmark results!

I will insert a link to this thread into the readme, so please continue sharing your results. I'd be pretty interested in the document size of the RON-encoded format and the time to rebuild the document from the RON-encoded format.

Do you think it makes sense to add Swarm to this benchmark, as it is based on RON?

I think it is a fair assumption for text types that lines contain only few characters. We could add another benchmark for this more realistic case. E.g. "Insert N characters at random positions in ꜖N/100˩ existing lines".

...the size of the resulting frame in RONt is about 100KB. That space is mostly taken by ids. An empty replica named "test" produces ~25byte/op updates in "[B1.4] Insert N characters at random positions (avgUpdateSize)". Overall, ~100KB in "[B1.4] Insert N characters at random positions (docSize)".
Again, this is close to the worst-case as random insertions break optimizations in the format.

Actually, the absolute worst case in RONt is 60bytes per a single-character insertion op.
Like, if we max out every field: numbers, replica ids, non-BMP Unicode chars.

Do you think it makes sense to add Swarm to this benchmark, as it is based on RON?

I don't think so.
gritzko/swarm is no longer supported

Hey @gritzko,

would you mind sharing the Chronofold results for applying the real-world dataset that @ept shared? I'd like to create a separate section that compares different implementations based on a real-world dataset as a baseline.

You can find the dataset here https://github.com/dmonad/crdt-benchmarks/blob/master/benchmarks/b4-editing-trace.js (it is copied with permission from the original repository).

I'm especially interested in the size of the final document docSize, and the time to parse the encoded document parseTime.

If possible, please share some insights on how much memory your implementation uses.