facebook / akd

An implementation of an auditable key directory

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Addressing atomicity of the data-layer

slawlor opened this issue · comments

A distributed implementation of AKD (single writer, multiple readers) is generally a valid use-case since to scale to hundreds of thousands, millions, or billions of users you can't support the proof request load on a single host (And likely need hundreds of hosts sharing a distributed data-layer).

Now further assume that the data layer doesn't support a global atomic commit of all the changes (which there can be millions of individual record changes for a large epoch), there is a strategy of write commits that can be done within AKD to support this.

  1. Incrementally (batch) commit all of the records except for the AZKS struct.
  2. Once all other records are committed, and correct, commit the AZKS struct which will trigger an update of the reader nodes to flush their local cache's and re-load data from the data-layer.

We've already addressed most of the scenarios related to this such that

  1. Readers only increment the target epoch on an AZKS update (#148, #149)
  2. When retrieving nodes/states/etc, they limit by the current epoch to not retrieve information in the future [in the case of a partially written next epoch] (#145)

However there are some cases in the event of the single-writer failing during the write, that it will recover, but may end up reading some future ValueState record since it doesn't currently filter by the epoch (I think).

Additionally this issue is to ascertain all the places we may need to read an older value than what's actually committed in the database due to a partially-written state.

CC: @afterdusk, @eozturk1, @Jasleen1, @kevinlewi

commented

However there are some cases in the event of the single-writer failing during the write, that it will recover, but may end up reading some future ValueState record since it doesn't currently filter by the epoch (I think).

Wouldn’t transactions prevent a writer from starting up while the data layer is in a partially published state? Or is this task about avoiding this situation without needing a transaction?

Sidenote, I believe we do already filter on the current epoch when fetching ValueStates in publish: https://github.com/novifinancial/akd/blob/main/akd/src/directory.rs#L132-L135

Yeah it's more if the writer crashed, and there had been partial data written to the storage layer. My concern (that I wanted to check with this issue) is that should "some" data be updated (But not the AZKS) would we be able to overwrite those partial records should we start publishing again.

I think your sidenote checked the concern in my opening this issue :p Seems like it's irrelevant. But I'm still going to check most of the access points to make sure that we aren't retrieving partially-future data (just to be safe). Seems like we're OK thought, I was worried we had missed this one point.