microsoft / Trill

Trill is a single-node query processor for temporal or streaming data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Aggregate Streams Over 24 Hours

JonathanKeav opened this issue · comments

Hi, I am hoping to use Trill for some analysis of temporal data. What I need is an aggregate by hour of the last 24 hours of data. The data arrives in batch of events every 15 mins. ~95% of these events will have a start time somewhere between the current time to 15 mins ago. ~5% will be older than 15 mins. The order will be relatively good (increasing in time) but some disorder. The application is to produce a running picture (reports and dashboards) of the last 24 hours that is updated after every batch arrives which is every 15 mins. So I could run the entire batch through a stream and call punctuate at the end of a batch but each batch will have ~5% of events that belong to the previous 15 mins. Occasionally you could get events lagging by 30 or 45 mins. This is rare but can happen and I need to capture these late events in aggregations for reporting.

Is it possible to achieve the above scenario with Trill?

commented

Not sure if this would be acceptable, but you could set a DisorderPolicy of Adjust, which will modify the ~5% of disordered events that arrive after their time window has lapsed to the current time window. Or, you could drop them altogether (DisorderPolicy.Drop). Unfortunately, Trill does not currently support out-of-order processing, i.e. emitting one result, then correcting that result later when more data arrives for that already-lapsed time window.