xerial / streamdb-readings

Readings in Stream Processing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Readings in Stream Processing

A list of articles that are essential to understand stream processing.

Books

Programming Models for Stream Processing

Table Catalog for Stream Processing

Watermark Management for Stream Processing

Workload Optimization

  • Towards a Learning Optimizer for Shared Clouds (VLDB 2019). Estimate cardinality models from the previous job executions in order to optimize the overall workloads. This work uses the multi-layer perceptron (MLP) neural network for learning models from query exeuction features (e.g., job name, input cardinality, average row length, input dataset names, etc.)
  • CrocodileDB: Efficient Database Execution through Intelligent Deferment (CIDR 2020) This paper introduces Intermittent Query Processing (IQP) approach for utilizing the knowledge about new data, query semantics, and users' expectation together to reduce the overall processing cost. It uses Deep Q-Materialization (DQM) to make a tradeoff under a certain resource constraint (e.g., memory, CPUs, storage) to decide how much data will be cached, pre-computed, pre-loaded, etc.
  • Peregrine: Workload Optimization for Cloud Query Engines (SOCC 2019) Analyzing the workload of historical queries and optimize recurrring queries, similar queries, and coordinating queries by extracing common subexpressions that can be materialized. To support various query engines including Spark, Microsoft has creaetd a common intermediate representation (IR) of workloads.

Iterative Data Processing

Incremental Processing

Incremental Processing with Materialized Views

Stream Log Collection Systems

Real-Time Stream Processing

Real-time stream processing usually means ultra-low latency applications to satisfy SLAs for returning results in a few seconds.

Stream SQL

GitHub Projects

Commercial Services

Stream Ingestion

External Lists

About

Readings in Stream Processing