facebook / akd

An implementation of an auditable key directory

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Efficient preloading of nodes during directory publish

kevinlewi opened this issue · comments

Currently publish works in 5 phases:
Phase 1: Load relevant user states from storage to see if previous versions of what should be inserted already exist
Phase 2: Translating a vec of (AkdLabel, AkdValue) into a vec of (NodeLabel, Vec), which are the labels and values that will be used to form tree nodes. This step requires the computationally-intensive VRF computations.
Phase 3: Now that we have a set of NodeLabels we want to insert, we will preload all of the nodes that need to be accessed from storage, before actually figuring out what values to insert. This allows for the next step to not have to hit storage at all (but only the cache).
Phase 4: Recursively insert leaves by starting from the root and going down the tree, figuring out where each node should go. This should only read from cache and produce a bunch of writes that we need to write back into storage.
Phase 5: Commit the transaction (do a single batch write of all updated nodes).

Ideally, after we preload nodes in Phase 3, that Phase 4 never hits storage -- otherwise, things could end up being much slower than we anticipate. What I'm curious about is, during Phase 4 we currently load a TreeNodeWithPreviousValue and optionally decide whether or not we need the previous version of a node.

However, I don't believe these previous versions have been preloaded yet, have they? If not, perhaps in Phase 3, we should attempt to preload them. This would involve adding a pointer from a tree node to the label corresponding to its previous version. I'm not sure how efficient the BFS would be, but maybe it would still be doable. Or maybe we can just hope that these nodes are already in the cache?

At any rate, another thing that we could benefit out of this is making explicit tests to measure how many times we are hitting storage in Phase 4.

Ah the TreeNodeWithPreviousValue only exists to handle the parallel-readers, single-writer distributed system problem. If we have a single directory writer who's updating the tree, but parallel readers to serve up proofs, then they may see a "future" node in storage for an epoch they don't yet know exists.

This is because not all storage layers are globally atomic, if they were we wouldn't need this. But imagine there's a directory writer who's currently "committing" the next epoch to the storage layers, and one of the proof-generator nodes reads a "future" value for a TreeNode in the tree, it then really needs the "previous" value. It's a pretty edge scenario, but one we wanted covered.

For the directory writer, it should always take the most up-to-date value as it's the only one writing to the data layer. I don't think there's a problem here with preloads imo. But let me know if I missed the point!

I see! So, the previous value of a TreeNodeWithPreviousValue is never needed to be retrieved during directory publish time, is that correct? We can close this issue if so. Thanks for clarifying my understanding!

Yes! Correct, I'll close the issue then :)