laforge49 / aatree

Immutable AA Tree

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Virtual AA-Trees

laforge49 opened this issue · comments

We already have basic immutable aatree-based structures and lazy structures which provide rediculously fast incremental deserialization and re-serialization. The next step is virtual structures which operate as ordinary sorted maps, vectors and sorted sets but which can be larger than will fit in memory.

These virtual structures will make use of the disk space management support provided by the Yearling database.

Public functions in lazy-nodes specific to lazy have been renamed.

IFactory, factory-regestry, AAContext and related functions have been moved from lazy-nodes to nodes, as this is not code specific to lazy.

The functions byte-size, load-vector, load-sorted-map and load-sorted-set are now indirect via options. This allows them to be different for virtual aa structures.

By defining a WrapperNode interface, implemented by LazyNode, much of the code in lazy-nodes becomes common code that can be moved to nodes.

Lazy-nodes has been shrunk to 174 lines. Small enough that it is reasonable to clone as a starting point for virtual-nodes.

Virtual-nodes has been stubbed by cloning lazy-nodes.

Yearling now uses virtual nodes by default. This gives us a test environment.

The serialized data for non-empty nodes contains an extra byte used to distinguish between in-line serialized data and a serialized reference to a disk block.

When the uber map is too large to fit in a root block, Yearling calls ((:as-reference) node opts) to convert the node into a reference to a disk block.

Moved cs256 from core to nodes.

The case where the root node is too large is handled, only we can not access the data that has been pushed to a separate disk block. That has to be handled in the deserialization logic when the reference flag is 1.

Reads from non-root nodes are now working.

At this point, virtually any update will result in a disk block leak, as a new block is allocated with each update without releasing the old block.

Leak fixed. On completion of an update, blocks referenced by nodes which were subsequently changed are released. This includes blocks referenced by nodes whose parent nodes were removed. Note that if the deleted subtree of nodes can be quite large when, for example, a large vector is removed from a map, so the operation may require considerable memory as well as time.

The issue now is to split blocks which become too big.

Blocks are splitting nicely. A bit more testing and we are done.

Put 100K map entries into Yearling, 1000 at a time. Worked nicely.

OK, some documentation is needed. :-)

Both the wiki and the readme have now been updated. Done!