Cause

An EDN-like CRDT (Causal Tree) for Clojure(Script) that automatically tracks history and resolves conflicts.

Cause is like git for collaborative applications. It is designed to look and feel like regular EDN data (maps, lists), while also exposing the richness of the append only CvRDTs. Cause handles common collaboration problems like deterministically synchronizing complex data structures across devices, and tracking history. Rather than rely on a central authority, conflicts can be resolved on every client. This makes Cause well suited for decentralized p2p applications.

The implementation takes a lot of cues from Data Laced with History and prior art related to CRDTs.

Why?

Why create Cause? Why another CRDT?

Before starting Cause I was working on a rich text editor in Slate. I wanted online/offline sync, collaboration, granular version control, zero dependency on a centralized server, and all of this in one persistent data structure that's easy to render, edit and save to a database. Projects like Automerge and Schism offered compelling solutions, but fell short on one or more of the properties I was looking for.

Properties

This is what Cause strives to do:

One data structure for everything. In Cause this is the node. Nodes are just triples of [id cause value]. They store identity and causality, and can easily be sent between clients, written to a database and rendered. Multiple in-memory caches make the read and write performance of common node operations fast. And those caches can be reconstituted from a bag of related nodes at runtime, so at rest storage is reduced.
Persist all the data. Clojure's immutable data structures and immutable data in general is great! It's hard to call Cause immutable since it is designed to be used in distributed eventually consistent systems. Nodes will inevitably arrive in different orders and clients will often have different sets of nodes. What can be guaranteed is that no node is deleted. Cause is append only, with deletions being represented by adding tombstones that hide nodes instead of actually excising the original data. This is similar to Datomic and makes infinite undo and change tracking possible.
Simple conflict resolution. The functions that determine the current state of a causal collection should be easy to reason about. This makes them easier to develop correctly and easier to work with intuitively since they have very few corner cases. Most of this work happens in about 50 LOC with the causal.shared/weave-node function and the two predicates that power it.
Idiomatic EDN. The higher level causal data structures that are built from nodes implement many of the same protocols as their EDN counterparts. Most Clojure functions should just work on CausalLists and CausalMaps the same way they'd work on lists and maps.
Extended EDN. Causal collections have some properties that don't fit into existing Clojure collection protocols, specifically identity and history that can span multiple interrelated collections. These database-like properties can efficiently and easily be managed with a CausalBase.

How it Works

Causal collections (lists, maps) are made of nodes
- node - a triplet made of [id cause value]
  - id - an easily sortable triplet of [lamport-ts site-id tx-index]
    - lamport-ts - a logical clock incremented on every transaction. A natural int counter starting at 0.
    - site-id - a random uid that represents a location (your computer, my computer)
    - tx-index - the order within a transaction. A natural int counter that is reset to 0 at the start of every transaction.
      - A transaction is when both lamport-ts and site-id are the same for multiple nodes. Then tx-index serves as the tie breaker
    - sorting ids
      - first on logical time lamport-ts
      - if multiple times collide they are sorted on location site-id
      - if multiple nodes are inserted at the same time on the same machine that's a transaction and tx-index is used keep them unique and ordered
  - cause - the identifier that "caused" a node
    - in a list cause is the id of the preceding node, creating a linked list
    - in a map cause can be a key or an id (it's normally a key, but tombstone operations like undo might be caused by a more granular id)
  - value is whatever you set it to, a string, a keyword, another causal collection
    - since causal collections are append only, if you want to delete (hide) a value you must insert a causal.core/hide value. This is non destructive and enables synchronization and time travel.

Nodes are all you need. From a bag of nodes we can consistently weave (build an ordered cache of) them every time. Nodes are also unique so we can deduplicate them across a chatty network. And they include complete history information: time = lamport-ts, who = site-id, transaction = lamport-ts & site-id, tx order = tx-index. They do not include wall clock time, but they do have everything needed for infinite undo / redo as well as version control and blame tracking.

Cause trades a linear increase in spacial complexity (where n is the set of all additions and subtractions) for all the properties above. Lists are the most complicated to keep sorted and suffer from a linearly increasing time complexity on both reads and writes. Fortunately transacting contiguous sequences is O(n + m) and not O(n * m), so operations like pasting a large sequence stay linear.

Examples

CausalBase

This is the highest level abstraction and what most people will want.

(ns example
  (:require [causal.core :as cause]))

(def cb (atom (cause/base)))                                            ; like a database, but for nested causal collections

(def starting-data {:a 1 :b [2 3]})

(swap! cb cause/transact [[nil nil starting-data]])                     ; inserts a root map collection with a nested list under the :b key

(def root-collection-uuid (cause/get-uuid (cause/get-collection @cb)))  ; get the root collection with the single arity form of cause/get-collection
(swap! cb cause/transact [[root-collection-uuid :c 4]                   ; "assoc" :c 4
                          [root-collection-uuid :a cause/hide]])        ; "dissoc" :a
                                                                        ; Transactions are atomic so the addition of :c 4 and the removal of :a will happen at the same time

(cause/causal->edn @cb)                                                 ; {:b (2 3) :c 4}

(seq (cause/get-collection @cb))                                        ; ([[2 "BIW6uN8ONfCyf" 0] :c 4] -- the nodes that make up the root map
                                                                        ;  [[1 "BIW6uN8ONfCyf" 3] :b :causal.base.ref/eA0tTyXym77oPL0Lf4X6P])

(seq (cause/get-collection @cb (:b (cause/get-collection @cb)))))       ; ([[1 "BIW6uN8ONfCyf" 1] [0 "0" 0] 2] -- the nodes that make up the list under the :b key
                                                                        ;  [[1 "BIW6uN8ONfCyf" 2] [1 "BIW6uN8ONfCyf" 1] 3])

; TODO: add more examples
; undo
; redo
; reset
; transact into specific points in a list

CausalList

(ns example
  (:require [causal.core :as cause]))
  
(def cl (atom (cause/list :a :b :c)))

(first @cl)   ; [[1 "a-site-id" 0] [0 "0" 0] :a] -- [0 "0" 0] is the id of the root-node that every causal-list starts with
(second @cl)  ; [[2 "a-site-id" 0] [1 "a-site-id" 0] :b] -- :b is "caused" by :a
(last @cl)    ; [[3 "a-site-id" 0] [2 "a-site-id" 0] :c] -- :c is "caused" by :b

(swap! cl conj :d)       ; :d will be inserted at the end, after :c
(cause/causal->edn @cl)  ; (:a :b :c :d)

(ffirst @cl)                                     ; [1 "a-site-id" 0] -- the ID of the first active element, in this case :a
(swap! cl cause/append (ffirst @cl) :ab)         ; :ab will be inserted between :a and :b since (ffirst @cl) is the ID of :a
(cause/causal->edn @cl)                          ; (:a :ab :b :c :d)
(swap! cl cause/append (ffirst @cl) cause/hide)  ; this will hide :a, cause is append only so nothing gets deleted
(cause/causal->edn @cl)                          ; (:ab :b :c :d)
(swap! cl cause/append (ffirst @cl) cause/hide)  ; this will hide :ab
(cause/causal->edn @cl)                          ; (:b :c :d)

; seq and any function derived from Seqable (aka most functions implemented by CausalLists and CausalMaps) return nodes, not just values.
; This conveniently exposes id and cause metadata so you always have access to it.
(seq @cl)  ; ([[2 "a-site-id" 0] [1 "a-site-id" 0] :b]
           ;  [[3 "a-site-id" 0] [2 "a-site-id" 0] :c]
           ;  [[4 "a-site-id" 0] [3 "a-site-id" 0] :d])

; Despite being append only and nodes only ever being hidden you should
; expect functions to behave like they're only working with the active nodes.
(count @cl)  ; 3

CausalMap

(ns example
  (:require [causal.core :as cause]))
  
(def cm (atom (cause/map :a 1 :b 2)))

; causal-maps are even simpler to work with.
; For the most part you can just assoc and dissoc and ignore ids.
(:a @cm)                 ; 1
(swap! cm assoc :c 3)
(get @cm :c)             ; 3
(swap! cm dissoc :b)
(cause/causal->edn @cm)  ; {:a 1 :c 3}
(seq @cm)                ; ([[0 "a-site-id" 0] :a 1] -- exposes the ID of each node
                         ;  [[2 "a-site-id" 0] :c 3])

Installation

The API and data structures have no guarantee of stability until 0.1.0 is published (See Roadmap).

If you want to try this pre-release code that will likely change you can use git deps.

Development

Test CLJ

lein test

or with a watcher

lein bat-test auto

Test CLJS

npm install
lein cljs-test

Dev REPL / nREPL

lein repl

user=> (cljs) ; to get a cljs repl
cljs.user=> :cljs/quit ; to go back to clj repl

Notes

Clojure Collection Functions

Most Clojure collection functions should just work with causal collections. E.g. count returns the number of active values, automatically ignoring hidden and special values. Or for a causal map (get (cause/new-causal-map :foo "bar") :foo) will return "bar".

On the other hand, seq and functions related to seq like first, last, next, rest, map return active nodes, not just values. This preserves the helpful causal metadata like ids and causes so you have access to it while iterating over a collection.

Gotchas:

All causal collections act like their EDN equivalent collection around Clojure collection functions. This means they only reveal active values. E.g. if a key is dissoc'd from a causal-map you will not be able to get it even though it does still exist in the underlying causal data structure and is only hidden with a tombstone.

History / Change Tracking / Version Control

Causal collections will automatically track the order values are inserted into them via lamport timestamps. This includes deletions and potentially undo, redo and rewind operations. These values are all immutable, so the entire history of a causal collection exists within it. Blame information is also available via site-ids so you can see which site made each change.