quixio / quix-streams

Quix Streams - A library for data streaming and Python Stream Processing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add ability to optionally overwrite values for existing timestamps when adding to the buffer

w-as opened this issue · comments

commented

Is your feature request related to a problem? Please describe.
When adding data to the buffer, before data in the buffer is published it would be helpful to have the ability to optionally overwrite data that is already in the buffer for a specific parameter and timestamp.

Given the following use-case:

Source data arrives in the producer via a series of data packets, each containing data for a subset of parameters and timestamps, and these packets of data may include back-filled data.
For example:

"packets": [
  { // First packet - Note samples 0.4 and 0.8 are missing
     "timestamps": [0.0, 0.1, 0.2, 0.3, 0.5, 0.6, 0.7, 0.9], 
     "parameter1": [  0,   1,   2,   3,   5,   6,   7,   9],
     "parameter2": [  0,   2,   4,   6,  10,  12,  14,  18]
  },
  { // Second packet - Data sequential and complete
     "timestamps": [1.0, 1.1, 1.2, 1.3, 1.4, 1.5],
     "parameter1": [ 10,  11,  12,  13,  14,  15],
     "parameter2": [ 20,  22,  24,  26,  28,  30]
  },
  { // Third packet - Includes back-fill data for first packet and data which follows the second packet
     "timestamps": [0.4, 0.8, 1.6, 1.7, 1.8, 1.9],
     "parameter1": [  4,   8,  32,  34,  36,  38],
     "parameter2": [  8,  16,  64,  68,  72,  76]
  }
]

In order to present a consistent timeseries to the consumers of the Quix stream at a consistent sample rate, when processing each packet we project all timestamps between the first and last timestamp in the packet and carry forward the most recent value when producing our Quix timestamp data if an expected timestamp is not available.

For example, as the first packet above is missing samples for 0.4 and 0.8, when processing the packet, we output the following:

{ // Projected samples:                 __+__                    ___+__
     "timestamps": [0.0, 0.1, 0.2, 0.3, |0.4|,  0.5,  0.6,  0.7, | 0.8|,  0.9], 
     "parameter1": [0.0, 1.0, 2.0, 3.0, |3.0|,  5.0,  6.0,  7.0, | 7.0|,  9.0],
     "parameter2": [0.0, 2.0, 4.0, 6.0, |8.0|, 10.0, 12.0, 14.0, |16.0|, 18.0]
}

However, for the third packet, we would project all timestamps between 0.4 and 1.9 as below:

{ // Projected samples:  _______+_______       _________________+_________________
     "timestamps": [0.4, |0.5, 0.6, 0.7|, 0.8, |0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5|, 1.6, 1.7, 1.8, 1.9],
     "parameter1": [  4, |  4,   4,   4|,   8, |  8,   8,   8,   8,   8,   8,   8|,  32,  34,  36,  38],
     "parameter2": [  8, |  8,   8,   8|,  16, | 16,  16,  16,  16,  16,  16,  16|,  64,  68,  72,  76]
}

If, as is the case with current the Quix.Streams implementation, when we process these in order the latest value wins, our projected timestamps will overwrite the original (correct) data.

Describe the solution you'd like
Our preferred solution is to create a status flag when we are projecting the data before we add it to the Quix buffer which indicates whether values have been carried over for each timestamp. When this status flag indicates that the value is carried over, but a value for a parameter has already been added to the buffer, we would like to ignore the new value.

However, when a value for the timestamp has not previously been added to the buffer, or when the previous value was carried over and we now have a valid timestamp, we would like to replace the previously added value in the buffer before it is published. When the status flag indicates that the value is valid, we would always like to replace the existing value.

To enable this behaviour, we propose that a change to Quix.Streams library is made to add the following functionality:

  • Add ability to retrieve an existing TimeseriesDataTimestamp instance for a specific timestamp if previously added to the buffer.
  • Add overloads to TimeseriesDataTimestamp.AddValue methods to add a bool overwrite argument which, when a value is already available in the buffer, indicates whether it should be replaced by the provided value or be ignored.

Usage could then be as follows:

for (var i = 0; i < samples.Timestamps.Length; i++)
{
    var sampleTimestamp = samples.Timestamps[i];
    var value = samples.Values[i];
    var status = samples.Status[i];

    var timestamp = producer.Timeseries.Buffer.GetOrAddTimestampNanoseconds(sampleTimestamp);
    timestamp.AddValue(parameterId, value, overwrite: status); // Only overwrite *existing* samples if status is true
}

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
N/A

Additional context
Add any other context or screenshots about the feature request here.
N/A

Leaving the relevant PR here, which I closed for now, but the solution is probably not too far from what we'll ultimately go for. #144