ghyatzo/woot-crdt

WOOT framework

The idea is to have a framework that respects the following three properties (called consistency model):

(1) Convergence: applying the same set of operations to each site results in the same buffer between them all
(2) Intention Preservation: for each operation, applying it at each site is the same as applying it at the starting site (i.e. : if I have "ab" on one side and I add 1 between 'a' and 'b', I want the operation applied on another buffer respect the fact that 1 is between 'a' and 'b', that is the intention).
(3) Casuality Preservation: given any pair of operations Oa, Ob, if they are randomly related Oa -> Ob, then it must be that Oa is executed before Ob. (i.e. in our very simple case the relation Oa -> Ob, is a "temporal" one, I will explain later what is meant by "time")

The framework is structured around two basic operations:

ins(a < c < b) insert the character c between the character a and the character b
del(c) deletes the character c. (deletion is special, see below)

These operations differ from the usual position-based operations ins(p, c) and del(p) where we specify the position of the buffer in which to insert or remove a character. This causes problems, as we have seen, because we have to make sure the buffer does not change during these operations.

The problem with operations like ins(a < c < b) is that specifying "between a and b" offers only partial ordering (i.e. I can't tell if something comes before the other for each pair of elements. There will be some pairs for which I cannot say anything). To overcome this problem, the framework uses special characters with a kind of unique id. Thanks to which we can, starting from the partial order indicated by 'intentions' we can instead reconstruct a linear order equal for all.

Here the first (perhaps) pain point: The framework does not act on standard buffers, but acts on a more complex representation of them (in our case we should keep a mirror of the real buffer and switch between them). Each element of the buffer is called a W-Character and is a tuple of 5 elements: <id, char, v, cp, cn> where: -id is the unique id of our W-character -char is the actual alphanumeric character (the one we may see in the actual buffer) -v is the visibility flag (see below for tombstone/del discourse) -cp is the unique id of the previous W-character of our intention (the 'a' in a < c < b) -cn is the unique id of the next W-character of our intention (the 'b' in a < c < b)

What is a character id?

A character id is a tuple <ns, ng> which depends on the "time" and site where the character was generated. Each site (or client) will have a unique representative id, and that will be ns. (It is important to stress that we need to be able to make comparisons between these ids.)

Second: time, ng represents the time that character was generated, and is nothing more than a counter that is incremented every time we add a character to the buffer. This ensures that the characters are truly unique.

Pain point number two (maybe):

The issue of delete and the tombstone approach. As you may have guessed, in this model you don't really delete, you "hide". In fact, applying the delete operation is nothing more than flipping the v flag of the desired character. Thus, after a session the WOOT buffer will contain all the characters ever typed in that session.

In my opinion, this is not too problematic since we are not talking about millions of characters, and in any case it is always possible to do some garbage collection on the buffer once in a while. (This is a problem in a purely peer 2 peer context, but we have a central server that can act as manager).

How it works:

That said, the operation is very simple:

one site generates an operation, applies it to its own buffer and then broadcasts it.
other sites receive the operation and store it in a pool.
the sites can then iterate within this pool and apply all the operations, and this is very important, THAT ARE APPLICABLE.

What do i mean by "are applicable"? Suppose you have your buffer "ab", I perform two operations ins(a < 1 < b) and ins(a < 3 < 1) one after the other resulting in "a31b". These two operations are sent to another client, but he receives them in the opposite order: since his buffer is still "ab" the operation ins(a < 3 < 1) makes no sense, since 1 does not yet exist. No problem, we skip it and wait until we get an operation that adds that 1 (the base assumption is that we receive all operations eventually). At which point we can try the first operation again, and this time it works and gets placed in the right place. This is the main mechanism behind the idea of Eventual Consistency.

The cool part is the promise that if I get all the operations generated by others, no matter what order or how long in between, I will always be able to rebuild the buffer as intended. The operations are completely independent of the state of the buffer when they arrive.

Code

It's a very simple project with just a main.cpp file where I do some testing and a types.cpp + header file where I roughly implement the framework as a Proof of Concept. In the main.cpp file I have also made little diagrams explaining what is going on and showing the commutativity between various combinations of operations.

ghyatzo / woot-crdt

WOOT framework

What is a character id?

Pain point number two (maybe):

How it works:

Code

About

Languages