composablesys / collabs

Collabs library monorepo

Home Page:https://collabs.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Interfaces RFC: Sets (PlainSet, CrdtSet)

mweidner037 opened this issue · comments

Interfaces

https://github.com/composablesys/compoventuals/blob/master/client/src/crdt/set/interfaces.ts

Questions

  • Is the distinction between reset and clear meaningful? reset is supposed to do an observed-reset, while clear is supposed to call delete on every element; these are distinct if deletes do not follow an add-wins/observed-remove semantics. Also, it is questionable whether reset should be included at all (= the interfaces extend Resettable), since some implementations choose to ignore reset (do nothing). But it is nice for the set interfaces to be resettable so that you can embed generic sets in Riak-style collections, without having to write e.g. CrdtSet & Resettable everywhere.
  • Should we have the generic type CreateArgs on (general) CrdtSets, or have a separate interface for implementations that can take arguments to create? The Riak-style Crdts can't accept arguments to create and so they set CreateArgs to []. In general, it could be awkward to have this extra type parameter to carry around, although it does default to [], so you only have to write CrdtSet<C> in typical usage.
  • In CrdtSet parameter names and docstrings, I've been calling the elements "value Crdts" instead of "values", to emphasize the fact that conceptually, the value is the content represented by the Crdt, not the Crdt itself. However, I've left the method names and event field names as "value", for compatibility with PlainSet. Should this be made consistent (either always "value Crdt" or always "value")?
  • CrdtSet.restore: would "add" be a better name? "add" would be more consistent with PlainSet, but "restore" emphasizes the fact that you can't add an arbitrary Crdt---you can only restore a previous element that has been deleted, assuming you kept a reference to it somewhere else. Also, should the Add event be renamed Restore? Currently I called it Add since it gets called both when an element is initially created and when it is restored; but we could separate those into different events.

Planned implementations

This is an area where I feel there will be too many implementations if I make them all, but I am not sure which to prioritize. So there is a general question, "Which implementations should be included?".

PlainSet

  • BooleanPlainSet: constructs a PlainSet from an arbitrary Boolean Crdt that is initially false, by storing the set as a CrdtMap<T, Boolean>.
  • AddWinsPlainSet: standard add-wins set, implemented using BooleanPlainSet with a TrueWInsBoolean.
  • SequentialPlainSet: uses sequential semantics, which is only allowed if you never add and remove elements concurrently. Not eventually consistent if used wrong, so it should come with a warning; should it even be exported? It is optimized because it does not need to store a Boolean Crdt per-value, instead using an ordinary Set for its state.
  • GPlainSet: a grow-only set. Implemented as a subset of SequentialPlainSet that throws an error on delete.

CrdtSet

I know of two kinds of implementations that are "memory-safe" / no tombstones, i.e., the size in memory is bounded by the user-visible state size:

  • Riak-style: consider a value Crdt to be present whenever it is nontrivial (not fully reset), and GC any fully-reset values, recreating them from scratch if needed later. If you want delete(valueCrdt) to actually make valueCrdt no-longer-present, it has to call valueCrdt.reset(), as in the Riak map. CreateArgs has to be [] in order for "recreating them from scratch" to work (since the original args are forgotten).
  • Yjs-style: deletes are permanent; a deleted valueCrdt will no longer receive any messages, will cause an error when operated on locally, and cannot be restored.
    • Cons: no restore; deleted valueCrdts used elsewhere will be frozen in states that are possibly inconsistent across replicas.

I haven't thought of good names for these yet; the preliminary implementations are called RiakCrdtSet and YjsCrdtSet, but I expect we'll want to change those to describe what they do, instead of naming other libraries.

It also seems wise to make some CrdtSets that are not memory-safe but have otherwise optimal semantics:

  • A Yjs-style set where delete(valueCrdt) marks valueCrdt as deleted, but doesn't freeze or reset the Crdt, instead keeping it as a tombstone. restore(valueCrdt) then restores it with its previous state.
  • Same, but valueCrdt is automatically restored if someone performs an operation concurrent to its deletion, similar to revivals in the Riak map (but without delete-as-reset). This seems closest to user expectations: if someone else deletes a document that I'm concurrently typing in, I would like the delete to have no effect.
  • Perhaps also a grow-only variant, which is the same as the Yjs-style set except that it throws an error if delete is called.

Helpers

AbstractPlainSet and AbstractCrdtSet give default implementations of certain methods (e.g., iterators that are aliases of each other). DecoratedCrdtSet applies the decorator pattern to a CrdtSet. These are currently used internally; not sure if they should also be exported.

Implementation idea: SortedCSet, which maintains a sorted view of its elements, like Java's SortedSet.

Regarding iterator order: when implementations do guarantee an order (e.g., it is the order elements were added locally), mention it in the docstrings. The ordering guarantee could be useful in some implementations, e.g., a chat log where you want to display messages in the order they were locally received. Need to make sure the order is preserved in save data.
Likewise for CMap.

Done by #138