Roadmap
sentientwaffle opened this issue Β· comments
djg commented
The tasks listed are not guaranteed, and they are not ordered.
The intent is to give a sense of the project's direction.
- π΄: Must be done for the production release.
- π‘: Nice-to-have for the production release.
Stability: Storage
- VSR: Manifest free-set. π΄
- VSR: 256-byte headers. π΄
- VSR: State machine version in headers. π΄
- VSR: Async checkpoints π΄
- LSM: Remove filter blocks. π΄
- LSM: Re-implement secondary index tombstone optimization. π‘ #1352
- LSM: Compaction optimizations
- Coalesce small adjacent tables (context: #463) π΄
- "Move-data-block" (more granular than "move table")
- Start the next round of compaction reads before starting the merge (cpu) work.
- Last-level of each tree should have double-size tables, but half as many. π΄
- VSR: Size manifest trailer, and pace manifest compaction, to guarantee capacity. π΄
-
VSR: Encode configuration data into superblock.π΄ - VSR: Reserve more space in
SuperBlock.VSRState
for future use. π΄ - Guard against running a binary against a data file that was created with a different configuration. π‘
- LSM: Add value count to
TableInfo
. (And possibly value-block count, since compression will decouple the ratio between the two.) π΄ - Redo snapshots. π΄
- Snapshots should be relative to the op that "creates" them, not the op that compacts them.
- Maybe use timestamps instead of ops as snapshot ids.
- Store snapshot in manifest block header (like we do for all other blocks).
- Reserve a some extra space in the superblock for future use, just in case? (Since "growing" the superblock is not possible once a replica is formatted.)
- VSR: Align grid zone start to grid block size. π΄
- VSR: Remove superblock trailers. π΄
- Encode the client sessions trailer into the grid.
- Encode the manifest trailer into grid blocks. (As an on-disk doubly-linked list.)
- Encode the manifest-freeset into one grid block.
- Increase the number of superblock copies, since they will be so much smaller.
- VSR: Panic on nondeterminism, don't try to state sync recover. π΄
Stability: API
- Client: Add automatic batching to client implementations. (#489, #523) π‘
- Client: Handle evictions gracefully (e.g. throw error, allow reconnect β don't panic!) π‘
- StateMachine: Maximum linked chain size, to limit
scope_rollback_log
size. (Maybe? π΄) - StateMachine:
get_account_transfers
(temporary feature until full query API is done). (Requires range queries). π΄ - StateMachine: Store point-in-time balances
- (Maybe this is no longer in the roadmap due to #1157 un-splitting the Account grooves?)
- StateMachine: Pending transfer timeouts (requires range queries) π΄
- StateMachine: Close account (maybe done by #449?)
- StateMachine: Add bulk-import path for data (including timestamps). (Probably need a CLI for this too.) π‘
- StateMachine: Query API (requires range transfers) π‘
- Clients: Expose
flags
as struct-of-boolean instead of integer bitset. (Golang client does this already, Node client does not.) π‘ - Persistent snapshots + historical queries (e.g. "what was the balance of account A at time T").
- Bitemporal data (maybe?)
Safety
- VSR: State sync (to catch up >1WAL). π΄
- VSR: Include checkpoint identifier in
prepare
messages instead ofprepare_ok
messages. (Requires 256-byte headers.) π΄ - VSR: Remove state sync kludge. (Requires async checkpoints.) π΄
- VSR: Include checkpoint identifier in
- VSR: Grid scrubber, to guard against double-faults. π‘ (This is mostly done.)
- VSR:
repair_pipeline_read_callback
recurses when messages are cached in the pipeline. Restructure to avoid stack overflow risk. π‘ - Storage: Audit TODOs in
linux.zig
andsrc/storage.zig
. π΄ - VSR: Write + erase a random amount of sectors during replica formatting, to ensure that if all replicas are each deployed to the same model of SSD, that they are not overexposed to faults that impact the same physical block address on each SSD.
- Note that this does not need to impact the storage format at all.
Performance
- StateMachine: Optimistic state machine execution.
- LSM: Compaction Beat pacing π‘
- Spread work more evenly between beats (to avoid latency spikes at the end of a half-bar).
- LSM storage at the end of each beat will be deterministic (instead of at the end of each half-bar).
- LSM: Compaction optimizations
- LSM: Fix sequential grid-read bottleneck.
- LSM: Manifest log open prefetch.
- LSM: Add "sequential" bit for constant-time lookup in consecutive-key value block.
- LSM: Compress value blocks.
- VSR: Fix checkpoint latency spike:
- VSR: Allow queuing requests during checkpoint. (See: #558)
- VSR: Async checkpoints. π΄
- VSR: Grid block reference-counting or cache/stash, to avoid internal block copying during compaction.
- VSR: Adaptive message timeouts.
- VSR: To speed up grid block sync, allow a replica to intelligently send blocks before they are asked for. (This is important for e.g. manifest repair, which is otherwise sequential.) The receiving replica should stash these in its grid block pool so that it can (hopefully) avoid a round trip to repair them.
Experience: Operations
- LSM: Runtime-configurable NodePool size. π΄ (#1447)
- LSM: Default NodePool size (
lsm_forest_node_count
). (Currently it is constant and too small.) π΄ - LSM: Replica must panic "nicely" (i.e. with a log message) if
NodePools.acquire()
has no nodes available. π΄ - LSM: Replica must panic nicely if Grid has insufficient free blocks. π΄
- LSM: Replica must panic nicely if forest has insufficient tables. (Don't exceed
table_count_max
.) π΄ - VSR: Reconfiguration protocol
- Add/remove replicas from the cluster.
- Coordinate rolling replica version upgrades.
- VSR: Improve asymmetric partition tolerance.
- VSR: Table sync congestion control.
- DNS addressing/lookups (#74)
- Metrics (e.g. Prometheus)
- Structured logging, to make parsing/indexing/searching easier
- Support for TLS between clients and replicas
- Disaster recovery tool/mechanism to repair storage determinism problems. (TBD)
- Document all CLI arguments.
Experience: Client
- Detect + fail on client/server version or configuration mismatch. π΄
- Also #316
- Node client should use tb_client.
Testing
- VOPR(hub) running without errors. π΄
- #1020, etc π΄
- VOPR: Test different configurations.
- VOPR: Test additional storage faults.
- VOPR: sometimes run with an unrestricted amount of faults
- Create StateMachine-level fuzzer. (Probably using Workload).
- Explore more workloads for the forest fuzzer.
- Antithesis. π‘
- Test a "full" LSM, to make sure it properly rejects requests. π‘
- Fuzz different compile-time and run-time configurations.
- Fuzz all components: #189. (Maybe this is unnecessary? Forest fuzzer is a higher priority.)
- Explicit code coverage marks. (Maybe reuse structured logging or metrics?)
Documentation
- Document security model.