Benchmarking RON-based chronofold implementation

Question

Benchmarking RON-based chronofold implementation

gritzko opened this issue 4 years ago · comments

Victor Grishchenko commented 4 years ago

Hi!

Martin @ept pointed me at this benchmark. Thanks, Martin!

Currently I have a rather unpolished RON+Chronofold C++ implementation which does text versioning/synchronization. It works in a slightly different model than Y.js or Automerge (see the draft at https://arxiv.org/abs/2002.09511). In particular, appending a char to a text would take nanoseconds, there is no point in comparing that case.

I thought, this microbenchmark should be close to the worst case for the C++ code:
[B1.4] Insert N characters at random positions (time)

That is because inserting-at-the-index is an expensive operation for RON+CF. It does not keep the text as a string, it has to recalculate such things. Also, inserting single characters at random positions breaks all chronofold optimizations. To make things comparable, I used the LineBasedText wrapper that uses {line,col} addressing.
Long story short, the first run of the bench said it spends ~3800ns per iteration (I use google/benchmark). That is a disappointing number... It is roughly ~6000*4usec for N=6000 iterations or 24ms in the "[B1.4]...(time)" line of the table.

Although, it is not directly comparable as it does not rebuild the entire text on each edit; only the affected line is read back. (That is the key idea of the LBT wrapper, it is made for text editors.)

I think I can measure the size of the resulting frame too...

Kevin Jahns · Answer 1 · Wed Apr 15 2020 06:27:49 GMT+0800 (China Standard Time)

Hi @gritzko,

thanks for sharing your benchmark results!

I will insert a link to this thread into the readme, so please continue sharing your results. I'd be pretty interested in the document size of the RON-encoded format and the time to rebuild the document from the RON-encoded format.

Do you think it makes sense to add Swarm to this benchmark, as it is based on RON?

I think it is a fair assumption for text types that lines contain only few characters. We could add another benchmark for this more realistic case. E.g. "Insert N characters at random positions in ꜖N/100˩ existing lines".

Victor Grishchenko · Answer 2 · Wed Apr 15 2020 14:10:03 GMT+0800 (China Standard Time)

...the size of the resulting frame in RONt is about 100KB. That space is mostly taken by ids. An empty replica named "test" produces ~25byte/op updates in "[B1.4] Insert N characters at random positions (avgUpdateSize)". Overall, ~100KB in "[B1.4] Insert N characters at random positions (docSize)".
Again, this is close to the worst-case as random insertions break optimizations in the format.

Victor Grishchenko · Answer 3 · Wed Apr 15 2020 14:14:12 GMT+0800 (China Standard Time)

Actually, the absolute worst case in RONt is 60bytes per a single-character insertion op.
Like, if we max out every field: numbers, replica ids, non-BMP Unicode chars.

Victor Grishchenko · Answer 4 · Wed Apr 15 2020 14:18:40 GMT+0800 (China Standard Time)

Do you think it makes sense to add Swarm to this benchmark, as it is based on RON?

I don't think so.
gritzko/swarm is no longer supported

Kevin Jahns · Answer 5 · Thu Jul 16 2020 21:52:23 GMT+0800 (China Standard Time)

Hey @gritzko,

would you mind sharing the Chronofold results for applying the real-world dataset that @ept shared? I'd like to create a separate section that compares different implementations based on a real-world dataset as a baseline.

You can find the dataset here https://github.com/dmonad/crdt-benchmarks/blob/master/benchmarks/b4-editing-trace.js (it is copied with permission from the original repository).

I'm especially interested in the size of the final document docSize, and the time to parse the encoded document parseTime.

If possible, please share some insights on how much memory your implementation uses.

Victor Grishchenko · Answer 6 · Sat Jul 18 2020 23:22:54 GMT+0800 (China Standard Time)

Hi Kevin! The link is 404 Sent with [ProtonMail](https://protonmail.com) Secure Email. ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

…

On Thursday, July 16, 2020 4:52 PM, Kevin Jahns ***@***.***> wrote: Hey ***@***.***(https://github.com/gritzko), would you mind sharing the Chronofold results for applying the real-world dataset that ***@***.***(https://github.com/ept) shared? I'd like to create a separate section that compares different implementations based on a real-world dataset as a baseline. You can find the dataset here https://github.com/dmonad/crdt-benchmarks/blob/master/benchmarks/b4-editing-trace (it is copied with permission from the [original repository](https://github.com/automerge/automerge-perf/tree/master/edit-by-index)). I'm especially interested in the size of the final document docSize, and the time to parse the encoded document parseTime. If possible, please share some insights on how much memory your implementation uses. — You are receiving this because you were mentioned. Reply to this email directly, [view it on GitHub](#3 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AAAT5LI24LOIJWBTTYEZB63R34A2TANCNFSM4MIACVGQ).

Kevin Jahns · Answer 7 · Sun Jul 19 2020 00:23:54 GMT+0800 (China Standard Time)

Sorry about that. Here is the correct one: https://github.com/dmonad/crdt-benchmarks/blob/main/benchmarks/b4-editing-trace.js