terminusdb / terminusdb-store

a tokio-enabled data store for triple data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ID remapping

matko opened this issue · comments

We need a generic way for a layer to remap its IDs.

This is required because we want layer transformations to not mess up the IDs we use to query the layer. When squashing a layer stack for example, a brand new dictionary would be built, and since dictionary IDs are bound to their lexical position in the dictionary, this combination of various dictionaries from different layers would result in very different IDs.

We need an optional extra file in the layer directory which maps the old ID to the new, inner ID. A logarray should suffice for this.

(Informally this has been referred to as 'pointer swizzling' or 'ID swizzling'.)

We actually need 2 extra structures, since we have two separate id ranges - one for node+value, and another for predicates.

We need to be able to quickly map from a string to an id and from id to string. A good fit for this may be a wavelet tree, which can do this mapping both ways in O(log(n)). An alternative is storing two logarrays, one for each direction of the mapping, which would give us O(1) at the cost of extra memory use.

This has now been implemented as part of delta rollups.