Question: Best practise for transforming timeseries data into logs with time-ranges

Question

Question: Best practise for transforming timeseries data into logs with time-ranges

paulwer opened this issue a year ago · comments

Hey everyone,
today we encountered an use-case, where machine-status-data saves as timeseries in influx should be transformed into log-like data with a start and stop timestamp.

f.ex.

array.from(rows: [
  {_time: 2021-01-01T00:00:00Z, _value: true},
  {_time: 2021-01-01T00:01:00Z, _value: true},
  {_time: 2021-01-01T00:02:00Z, _value: false},
  {_time: 2021-01-01T00:03:00Z, _value: false},
  {_time: 2021-01-01T00:04:00Z, _value: false},
  {_time: 2021-01-01T00:05:00Z, _value: true},
  {_time: 2021-01-01T00:06:00Z, _value: true},
])

should result in:

array.from(rows: [
  {_start: 2021-01-01T00:00:00Z, _stop: 2021-01-01T00:01:00Z, _value: true},
  {_time: 2021-01-01T00:02:00Z, _stop: 2021-01-01T00:04:00Z, _value: false},
  {_time: 2021-01-01T00:05:00Z, _stop: 2021-01-01T00:06:00Z, _value: true},
])

my first suggestion was to use a function like monitor.changeState. #3582 suggested it is not implemented.
As a workaround / idea to start, we had tried using the following statement, which is no quite performat, when quering larger datasets.
We are using the differnce to determine, if the boolean value has been changed and in which direction. We are using aggregate Sum to keep the latest value, when its been "active" within the whole range. (an no value_changes has occur)
Our api then aggregate this data to the wanted format. { _measurement: string; _field: string; start: Date; stop: Date; _value: boolean; }
Aggregation is not an option, because we need the precise time-stamps.

from(bucket: "bucket-name")
    |> range(start: -5h, stop: now())
    // additional filtering here...
    |> map(fn: (r) => ({r with rowId: 1, change: int(v: r._value) }))
    |> difference(initialZero: true, keepFirst: true, columns: ["change"])
    |> sort(columns: ["_time"], desc: true)
    |> cumulativeSum(columns: ["rowId"])
    |> sort(columns: ["_time"], desc: false)
    |> filter(fn: (r) => (r._value == true and r.rowId == 1) or r.change != 0)

@UlrichThiess

Paul Werner · Answer 1 · Wed Jun 28 2023 17:58:10 GMT+0800 (China Standard Time)

we also thought about running this query regulary with a smaller range and store the result into an other database.
is this apropiate? or did someone would suggest a better solution?

@UlrichThiess i tagged you because of a previous conversation :)

github-actions · Answer 2 · Mon Aug 28 2023 09:37:01 GMT+0800 (China Standard Time)

This issue has had no recent activity and will be closed soon.

Paul Werner · Answer 3 · Mon Aug 28 2023 14:22:34 GMT+0800 (China Standard Time)

any updates or suggestions?

github-actions · Answer 4 · Sun Oct 29 2023 09:40:00 GMT+0800 (China Standard Time)

This issue has had no recent activity and will be closed soon.

Paul Werner · Answer 5 · Wed Nov 01 2023 21:11:40 GMT+0800 (China Standard Time)

problem noch nicht gelöst.

github-actions · Answer 6 · Mon Jan 01 2024 09:45:06 GMT+0800 (China Standard Time)

This issue has had no recent activity and will be closed soon.

Paul Werner · Answer 7 · Tue Jan 02 2024 05:07:49 GMT+0800 (China Standard Time)

problem noch nicht gelöst.

...

Ulrich · Answer 8 · Tue Jan 30 2024 23:07:07 GMT+0800 (China Standard Time)

import "array"

data = array.from(rows: [
  {_time: 2021-01-01T00:00:00Z, _value: true},
  {_time: 2021-01-01T00:01:00Z, _value: true},
  {_time: 2021-01-01T00:02:00Z, _value: false},
  {_time: 2021-01-01T00:03:00Z, _value: false},
  {_time: 2021-01-01T00:04:00Z, _value: false},
  {_time: 2021-01-01T00:05:00Z, _value: true},
  {_time: 2021-01-01T00:06:00Z, _value: true},
])

// Markieren Sie den Start eines neuen Segments
marked = data
  |> map(fn: (r, index) => ({
      r with
      segmentStart: if index == 0 or r._value != array.get(arr: data, i: index - 1)._value then r._time else 0
    })
  )

// Zusammenfassen der Segmente
result = marked
  |> reduce(
      identity: {start: 0, stop: 0, value: false, init: true},
      fn: (r, accumulator) => ({
        start: if accumulator.init then r.segmentStart else accumulator.start,
        stop: r._time,
        value: r._value,
        init: if r.segmentStart != 0 then false else accumulator.init
      })
    )
  |> filter(fn: (r) => r.start != 0)

result

github-actions · Answer 9 · Sun Mar 31 2024 09:40:42 GMT+0800 (China Standard Time)

This issue has had no recent activity and will be closed soon.