Readings in Databases

A list of papers essential to understanding latest database development and building new database systems. The list is curated and maintained by Darren Fu (@darrenfu). If you want to contribute a paper review to this list, please submit a pull request.

Emerging database technologies
Query management
Query execution
Storage
Miscellaneous papers
Tech talks
References

Emerging database technologies

The Snowflake Elastic Data Warehouse: This paper highlights the key characteristics and features of Snowflake DW. Its selling point is: SaaS model and its multi-cluster, shared-data architecture. It also follows Separation of storage and compute, which is very modern in 2016 to embrace the cloud storage, such as S3. It provides DW solution to migrate from traditional database and Hadoop to public cloud infrastructure.

It also highlights its unique architecture and features, such as VW elasticity, local caching, file stealing, time travel & efficient cloning (on top of MVCC), min-max based pruning, semi-structured data (e.g. VARIANT), hierarchical key security models (keywords: key rotation & rekey). Its execution engine is columnar, vectorized, push-based. This is very similar to Facebook Velox's implementation.

It also compares the differentiators versus its competitors: Redshift, BigQuery, Azure SQL DW (presently Azure Synapse Analytics). Its dominant advantages are like compute resource horizontal scaling, ANSI SQL and semi-structured and nested data support. On the other hand, it brings up some technical chanllenges, such as multi-tenancy (think of Google's big metadata paper), full self-service model.
Assembling a Query Engine From Spare Parts: This paper introduces how Firebolt database is assembled using different opensource components as stepping stones within 18 months. Specifically:
- They chose Hyrise as the foundation of its SQL parser and planner (we would recommend DuckDB or Calcite now). The pros and cons comparison is a highlight in this paper to learn how to select a suitable opensource project as stepping stone.
- It chose a vectorization runtime: ClickHouse. They also implemented a new Firebolt distributed processing stack to replace the ClickHouse's stack.
- The storage engine uses a columnar data layout.
- For cross-network communication, it chose a custom protobuf-based serialization format (Substrait seems to be a better alternative now).
As a lesson learnt, they chose assembling on top of a solid foundation, building in a single language for high velocity, investing heavily to connect different systems, e.g. unifying the type systems across planner and runtime.

Velox: Meta’s Unified Execution Engine: TBD

Query management

TBD

Query execution

TBD

Storage

Magma: A High Data Density Storage Engine Used in Couchbase: This paper primarily introduces a bunch of optimization techniques to the LSM tree based storage engine in Couchbase. The next generation storage engine, Magma, will replace the current one, Couchstore, which is based on Copy-On-Write B+Tree. Its design goals is to minimize write amplification, to scale concurrent compactions, to optimize for SSDs, and to lower the memory footprint.

The gist of the optimizations is to separate the index data structure (LSM Tree Index) from the document data storage (Log Structured Object Store, which uses segmented log concept and allows range query by seqno). To separate key and value, Magma takes a different approach than Wisckey. It leverages sequential I/O access patterns by avoiding the index lookup during garbage collection. Other optimization highlights:
- [Index Block Cache] A read cache for caching the recently read (LRU) index blocks and locating the documents on log-structured storage. This object-level managed cache is more efficient than the block-level cache for document objects.
- [Compression] Magma uses block-level compression (BLC) using the LZ4 algorithm with better compression ratios for cold log segments while Couchstore uses document-level compression.
- [Gabarge collection] The key idea is to maintain a logically sorted delete list LSM tree of stale document seqnos and their size per log segment.
- [Crash recovery] Maintaining a metadata file to store a point-in-time snapshot checkpoint
- [Scaling with high IOPS] Using async I/O.
Fast Scans on Key-Value Stores: This paper enumerates the dominant factors impacting the performance of Key-Value Store (KVS) with point queries versus range queries. It depicts the SQL-over-NoSQL architecture and shows the possible compromises to support mixed OLAP/OLTP workloads on top of a KVS. The alternative approaches are based on three performance characteristics in terms of storage efficiency (fragmentation), concurrency (addition conflicts), cost to implement versioning and GC, and efficiency for scan and get/put operations. There are a total of 24 ways to build KVS using this taxonomy:
- (update-in-place vs. log-structured vs. delta-main)
- (row-major vs. column-major / PAX)
- (clustered-versions vs. chained-versions)
- (periodic vs. piggy-backed garbage collection)
The authors implemented two of the variants as below. TellStore makes careful compromises with regard to latency vs. throughput tradeoffs (e.g. batching) and time vs. space tradeoffs (lock-free hash table) to help remedy the effects of concurrent updates on scans.
- TellStore-Log: based on logstructured with chained-versions in a row-major format and
- TellStore-Col: using a delta-main structure with clustered-versions in a column-major format.

darrenfu / db-readings

Readings in Databases

Table of Contents

Emerging database technologies

Query management

Query execution

Storage

Miscellaneous papers

Tech talks

References

About