ArroyoSystems / arroyo

Distributed stream processing engine in Rust

Home Page:https://arroyo.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optimize Aggregates for offset Sliding Windows

jacksonrnewhouse opened this issue · comments

In the SQL planning logic aggregates over sliding windows use a two-phase aggregation strategy, rolling up an intermediate aggregate over each step size then updating the overall aggregate window value on each step. The current implementation requires that the step evenly divides the width, and when that isn't true it falls back to a much slower approach.

It is possible to modify the current algorithm so that it works with offset sliding windows, as follows:

Let the width be W, the step be S and some nonzero remainder R = W % S. Rather than having everything happen at every step S, you take a set of actions at time T when T % S = 0, and another set when T % S = R.

When T%S=0 calculate the partial aggregate from [T-S, T). This is then processed by all currently active windows and any new windows become initialized.

When T%S=R, calculate the final partial window of [T-R, T), then have the second phase aggregation consume it and emit for all live windows. If this is the last time a given window will be emitted, evict it from tracking.