Understand and simplify CometBFT database backend
lasarojc opened this issue · comments
Supersedes tendermint/tendermint#6032, breaking that issue down into more concrete, clearly separated deliverables/sub-tasks.
It's ultimately expensive and painful for the team to have to cater to many different use cases that require different underlying storage engines, and we would rather converge on a single database that meets our core requirements well. We do, however, want to simultaneously provide ways for integrators to access and transform core data into whatever storage system suits their use case.
Problems
- Through cometbft-db, CometBFT currently supports multiple database backends. As such:
- CometBFT does not make extensive use of database-specific optimizations.
- Storage behavior is not consistent across different databases, potentially resulting in more troubleshooting and bug fixing work for the team (e.g. tendermint/tendermint#8416, #1017 ).
- While we could just decide to only support GoLevelDB (as per tendermint/tendermint#9741), one of the most commonly used underlying databases for CometBFT, it seems to struggle with pruning (see tendermint/tendermint#9743, informalsystems/interchain#1). It's also not clear, given our typical storage workloads for the most common use cases, whether the underlying data structure that LevelDB provides is even suitable for CometBFT. Current problems experienced by operators seem to suggest otherwise.
- We currently do not have a very clear set of requirements for an underlying database for CometBFT.
RFC-001 provides some more detail around the problem space.
Work Breakdown
In order to achieve our overall goal of storage simplification, we need to complete the following work.
Original issue: tendermint/tendermint#9749
cometbft/cometbft-db#112 (comment)
So that's my reasoning. I'm happy to try to make goleveldb faster, too. You will need a beefy machine for the benchmarks.
Here is a branch with benchmarks: