tigerbeetle / tigerbeetle

The distributed financial transactions database designed for mission critical safety and performance.

Home Page:https://tigerbeetle.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Roadmap

sentientwaffle opened this issue Β· comments

commented

The tasks listed are not guaranteed, and they are not ordered.
The intent is to give a sense of the project's direction.

  • πŸ”΄: Must be done for the production release.
  • 🟑: Nice-to-have for the production release.

Stability: Storage

  • VSR: Manifest free-set. πŸ”΄
  • VSR: 256-byte headers. πŸ”΄
    • VSR: State machine version in headers. πŸ”΄
  • VSR: Async checkpoints πŸ”΄
  • LSM: Remove filter blocks. πŸ”΄
  • LSM: Re-implement secondary index tombstone optimization. 🟑 #1352
  • LSM: Compaction optimizations
    • Coalesce small adjacent tables (context: #463) πŸ”΄
    • "Move-data-block" (more granular than "move table")
    • Start the next round of compaction reads before starting the merge (cpu) work.
    • Last-level of each tree should have double-size tables, but half as many. πŸ”΄
  • VSR: Size manifest trailer, and pace manifest compaction, to guarantee capacity. πŸ”΄
  • VSR: Encode configuration data into superblock. πŸ”΄
  • VSR: Reserve more space in SuperBlock.VSRState for future use. πŸ”΄
  • Guard against running a binary against a data file that was created with a different configuration. 🟑
  • LSM: Add value count to TableInfo. (And possibly value-block count, since compression will decouple the ratio between the two.) πŸ”΄
  • Redo snapshots. πŸ”΄
    • Snapshots should be relative to the op that "creates" them, not the op that compacts them.
    • Maybe use timestamps instead of ops as snapshot ids.
    • Store snapshot in manifest block header (like we do for all other blocks).
  • Reserve a some extra space in the superblock for future use, just in case? (Since "growing" the superblock is not possible once a replica is formatted.)
  • VSR: Align grid zone start to grid block size. πŸ”΄
  • VSR: Remove superblock trailers. πŸ”΄
    • Encode the client sessions trailer into the grid.
    • Encode the manifest trailer into grid blocks. (As an on-disk doubly-linked list.)
    • Encode the manifest-freeset into one grid block.
    • Increase the number of superblock copies, since they will be so much smaller.
  • VSR: Panic on nondeterminism, don't try to state sync recover. πŸ”΄

Stability: API

  • Client: Add automatic batching to client implementations. (#489, #523) 🟑
  • Client: Handle evictions gracefully (e.g. throw error, allow reconnect – don't panic!) 🟑
  • StateMachine: Maximum linked chain size, to limit scope_rollback_log size. (Maybe? πŸ”΄)
  • StateMachine: get_account_transfers (temporary feature until full query API is done). (Requires range queries). πŸ”΄
  • StateMachine: Store point-in-time balances
    • (Maybe this is no longer in the roadmap due to #1157 un-splitting the Account grooves?)
  • StateMachine: Pending transfer timeouts (requires range queries) πŸ”΄
  • StateMachine: Close account (maybe done by #449?)
  • StateMachine: Add bulk-import path for data (including timestamps). (Probably need a CLI for this too.) 🟑
  • StateMachine: Query API (requires range transfers) 🟑
  • Clients: Expose flags as struct-of-boolean instead of integer bitset. (Golang client does this already, Node client does not.) 🟑
  • Persistent snapshots + historical queries (e.g. "what was the balance of account A at time T").
  • Bitemporal data (maybe?)

Safety

  • VSR: State sync (to catch up >1WAL). πŸ”΄
    • VSR: Include checkpoint identifier in prepare messages instead of prepare_ok messages. (Requires 256-byte headers.) πŸ”΄
    • VSR: Remove state sync kludge. (Requires async checkpoints.) πŸ”΄
  • VSR: Grid scrubber, to guard against double-faults. 🟑 (This is mostly done.)
  • VSR: repair_pipeline_read_callback recurses when messages are cached in the pipeline. Restructure to avoid stack overflow risk. 🟑
  • Storage: Audit TODOs in linux.zig and src/storage.zig. πŸ”΄
  • VSR: Write + erase a random amount of sectors during replica formatting, to ensure that if all replicas are each deployed to the same model of SSD, that they are not overexposed to faults that impact the same physical block address on each SSD.
    • Note that this does not need to impact the storage format at all.

Performance

  • StateMachine: Optimistic state machine execution.
  • LSM: Compaction Beat pacing 🟑
    • Spread work more evenly between beats (to avoid latency spikes at the end of a half-bar).
    • LSM storage at the end of each beat will be deterministic (instead of at the end of each half-bar).
  • LSM: Compaction optimizations
    • LSM: Fix sequential grid-read bottleneck.
  • LSM: Manifest log open prefetch.
  • LSM: Add "sequential" bit for constant-time lookup in consecutive-key value block.
  • LSM: Compress value blocks.
  • VSR: Fix checkpoint latency spike:
    • VSR: Allow queuing requests during checkpoint. (See: #558)
    • VSR: Async checkpoints. πŸ”΄
  • VSR: Grid block reference-counting or cache/stash, to avoid internal block copying during compaction.
  • VSR: Adaptive message timeouts.
  • VSR: To speed up grid block sync, allow a replica to intelligently send blocks before they are asked for. (This is important for e.g. manifest repair, which is otherwise sequential.) The receiving replica should stash these in its grid block pool so that it can (hopefully) avoid a round trip to repair them.

Experience: Operations

  • LSM: Runtime-configurable NodePool size. πŸ”΄ (#1447)
  • LSM: Default NodePool size (lsm_forest_node_count). (Currently it is constant and too small.) πŸ”΄
  • LSM: Replica must panic "nicely" (i.e. with a log message) if NodePools.acquire() has no nodes available. πŸ”΄
  • LSM: Replica must panic nicely if Grid has insufficient free blocks. πŸ”΄
  • LSM: Replica must panic nicely if forest has insufficient tables. (Don't exceed table_count_max.) πŸ”΄
  • VSR: Reconfiguration protocol
    • Add/remove replicas from the cluster.
    • Coordinate rolling replica version upgrades.
  • VSR: Improve asymmetric partition tolerance.
    • VSR: Table sync congestion control.
  • DNS addressing/lookups (#74)
  • Metrics (e.g. Prometheus)
  • Structured logging, to make parsing/indexing/searching easier
  • Support for TLS between clients and replicas
  • Disaster recovery tool/mechanism to repair storage determinism problems. (TBD)
  • Document all CLI arguments.

Experience: Client

  • Detect + fail on client/server version or configuration mismatch. πŸ”΄
  • Node client should use tb_client.

Testing

  • VOPR(hub) running without errors. πŸ”΄
  • VOPR: Test different configurations.
  • VOPR: Test additional storage faults.
  • VOPR: sometimes run with an unrestricted amount of faults
  • Create StateMachine-level fuzzer. (Probably using Workload).
  • Explore more workloads for the forest fuzzer.
  • Antithesis. 🟑
  • Test a "full" LSM, to make sure it properly rejects requests. 🟑
  • Fuzz different compile-time and run-time configurations.
  • Fuzz all components: #189. (Maybe this is unnecessary? Forest fuzzer is a higher priority.)
  • Explicit code coverage marks. (Maybe reuse structured logging or metrics?)

Documentation

  • Document security model.