ssbc / ssb-subset-replication-spec

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Subset replication for SSB

Status: Design phase

This module aims to provide methods for synchronizing a subset of one or more SSB feeds. The main motivation is to enable faster initial sync by only replicating a certain subset of messages that can then be supplimented with a frontier sync such as EBT.

Feeds in classical SSB can said to contain multiple subset of messages, this could be messages of a certain type (eg. contact or chess moves) or could be private messages pertaining to a private group.

In classical SSB, feeds up to a hops count (normally 2 or 3) are replicated in full. There also exists a method to get a particular message out of order, the security of this hinges on the fact that you get the ooo message by the hash and so is a special case of engtangled messages.

Here we are interested in adding APIs that, given ssb-meta-feed, allows two peers to exchange feeds and subset of feeds to allow for partial replication.

We are mainly interested in two classes of messages: a subset of messages of a classical SSB feed and a tangle of messages given a common root message.

getSubset(query, options): source

This method can be seen as a more general version of createHistoryStream where the id, or what you are interesting in, first needs to be defined. Query specifies what data you are interested in.

The query parameter is an instance of the query language defined in options.querylang.

To support pagination, startFrom, pageSize and descending can be specified in the options parameter.

It is up to the receiving end of the query to determine if the query results in too much load. This could be if someone tries to get data for a large number of feeds without proper pagination, or if the query includes arguments in a form that strains the query planner.

To get the latest 10 public post messages of a particular feed the following query and options arguments can be used:

{
  "author": "@6CAxOI3f+LUOVrbAl0IemqiS7ATpQvr9Mdw9LC4+Uv0=.ed25519",
  "type": "post",
  "private": false
}
{
  "querylang": "ssb-ql-0",
  "descending": true,
  "pageSize": 10
}

Result:

[msg1, msg2, ..., msg10]

Query language

ssb-ql-0

We start by supporting a proof-of-concept query language called "ssb-ql-0" which is a precursor for the more powerful "ssb-ql-1" (see below).

Queries in ssb-ql-0 are merely JSON objects with three required fields:

  • author: a valid feed ID as a string
  • type: a message type, can be any non-empty string, or null, which signals that messages of any type are accepted
  • private: a boolean

If private: true then type MUST be null. This is to prevent leaking metadata about private messages to the public.

ssb-ql-1

Status: under design. We recommend using ssb-ql-0 in the meantime. ssb-ql-1 has challenges such as determining when two queries are equivalent, which may be an NP complete problem. Read more on this SSB thread: %FjK8SdR2Q8n3XBwFKiI8ebdSr/Tp7bMzhr2R6YMq+b0=.sha256

Here we define a mini query language that can be used to specify a subset of data. The goal of this format is to be easy to parse and also be restrictive in the number of operations supported to make it easier to map to indexes and to limit the attack surface.

The query uses an object notation for operators that looks very similar to JSON. The objects contains two keys: op the name of the operation and a second parameter for the arguments given the operation.

Base operators:

op args name args type
and args [op, ...]
or args [op, ...]
type string string
author feed string

Note that ssb-ql-1 is only for public messages and is incapable of querying private messages, unlike ssb-ql-0.

The spec is open for implementations to add new operators relatively easy. This allows experimentation and implementations should simply error if presented with an unsupported operator. A new potential operator could be isPrivate or isBox2 but extreme care must be taken not to leak information using these queries around private information.

getTangle (TBD)

Tangles in SSB behave differently than other types of messages in that they are linked together across multiple feeds using links. This allows for a different kind of synchonization than getSubset. There are two common examples of where this could be useful: for getting all the messages of an old message threads that is outside the frontier of messages and to synchronize a private group (as all messages in those are part of a single tangle). While private groups are by definition private to external observers, the fact that nodes connecting in SSB are authenticated by their ID and that rooms allow direct connections between peer allows synchronization of a private group directly with another group member who also knows which messages belong to the group.

As private groups can grow quite large it becomes important that there is a way to synchronize only subsets but also changes or the latest part of a tangle. The membership part of a private group has a tangle root and thus can be replicated on its own.

Consider using set-reconciliation for this.

Prior work

Earlier threads on SSB discussing partial replication:

  • %L9m5nHRqpXM4Zkha1ENTk5wNOXQMduve8Hc9+F0RLZI=.sha256
  • %VNCOf3pfP7hjL3lcFvBSSOzRPtR0WHDtvlXvkBcha3I=.sha256
  • %gaYxXAEoSKf3dnw2OlI2EuoduFmkiU4kU+CncbBImPk=.sha256
  • %jM6Dw0BEg9IPVkDH5VaoiVSbt3gCnUfSoO1BhX2Ntpw=.sha256
  • %SEXkvEQCJd2jBrUXDQcFPVix/5eUffoKBURWyy5yYrU=.sha256

About