cboudereau / dataseries

Functions for dataseries / timeseries

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dataseries

License:MIT

data-series functions support for data-series and time-series.

rust

build codecov docs.rs crates.io crates.io (recent)

java

build codecov maven central javadoc

functions

union

Continuous time series union between 2 series. Left and right data can be absent (left and right only cases).

          1     3     10                 20
    Left: |-----|-----|------------------|-
          130   120   95                 160
                           12     15
   Right:                  |------|--------
                           105    110
          1     3     10   12     15     20
Expected: |-----|-----|----|------|------|-
          130,∅ 120,∅ 95,∅ 95,105 95,110 160,110

eventual consistency and conflict resolution for data-series

The crdt example provides an example of the conflict-free replicated data type resolution based on data-series union function and VersionedValue type to solve conflict with a timestamp (any variable supporting partially ordered set) for rust and java.

trade-offs

interval representation

Half-open interval                      Data-series (time-series/gauge)
(n value and 2n points/delta)           (n+1 datapoints)
                            \           
1    3    5                  \          1    3    5           +∞
[----[----[                   \         |----|----|------------
 100   120                    /         100  120  ∅
                             /
                            /
pros cons
half-open interval +same read and write model -illegal state representation
-requires global secondary index to support range queries
data-series/time-series +nosql and TSDB friendly
+less illegal states
+compatibility with time/data-series functions and visualization
+compact format (requires only n+1 datapoint intead of 2n))
+partitionning is trivial (only one dimension)
+updates are less complex (no need to update impacted points of interval)
-read and write models are different
-hole should be represented with a datapoint None value

An interval can be defined by using 2 points (upper and lower bound) with an associated value but it can be difficult to index those 2 points in nosql databases (Global secondary index) or simply using a TSDB (timeseries database).

Another approach consists of defining an intermediate model, a data-series with only one point and one value so that the datapoint fits really well with TSDB and is algorithm friendly.

It becomes also easy to avoid unwanted states; an interval can be defined with 2 points and the last point can be before the first one which is a bug in the domain. You can also define a point and a non negative offset which can work but requires more code.

For database support, interval reading requires 2 reads to compute the interval but the extra read can be hidden in easily in an Iterator.

implementation

rust and java implementation are provided in respective directories.

About

Functions for dataseries / timeseries

License:MIT License


Languages

Language:Java 63.9%Language:Rust 36.1%