Add compaction

Question

Add compaction

borg286 opened this issue 7 years ago · comments

Add a compaction command. It should take in a time range and some operation. It will then mutate the underlying chunks so that there are fewer chunks that represent the same time range, but are reduced by the specified function.
Some good functions to support are min, avg, max

This would allow the use case of having a fixed size of memory, yet still be able to retain some historic data at a lower granularity in time. Having different functions allows for an opinionated yet versatile way of reducing the data.

borg286 · Answer 1 · Sat Feb 11 2017 04:37:36 GMT+0800 (China Standard Time)

We'd need some dt param
tb.compact key start end (AVG|MAX|MIN) dt
The complication is that how to you handle gaps of data for a time range.
It'll have to be a pretty wicked loop that handles the alignment and missing data. But I think it is possible.

Danni Moiseyev · Answer 2 · Sun Feb 12 2017 00:32:15 GMT+0800 (China Standard Time)

Thanks for the feedback.
I'd prefer to go another route on how to implement this from several reasons:

This command will be O(n) complexity (n=number of samples) and will block any other operation on that redis
you will need to create some other tool that from time to time will have to call this function

I was thinking of creating some kind of aggregated timeseries that are recorded from a specific time series.
Meaning that each time a time series is added a sample it will trigger the append to the aggregated timeseries but using an aggregated function (ie: max/min/avg).
Another option is when we pass the time delta (lets say 1min), a range query will be executed for that interval and will be added to the aggregated time series.

Danni Moiseyev · Answer 3 · Sat Mar 11 2017 21:55:18 GMT+0800 (China Standard Time)

@borg286 I've made the groundwork for automatic compactions/rollups in the compactions branch.
If you would like to test and comment it will be great, otherwise i'll merge it to master in a few days.

borg286 · Answer 4 · Mon Mar 13 2017 23:52:40 GMT+0800 (China Standard Time)

AGG_TYPE - aggregration type one of the following: avg, sum, min, max, sum
"sum" is listed twice. I'll try it out and provide feedback.

you said "DEST_KEY should be of a timeseries type, and should be created before TS.CREATERULE is called." I wonder if this could be done in code rather than imposing this rule on the user.
SINTERSTORE doesn't require the destination to exist. While I realize that your code mutates the destination while SINTERSTORE doesn't, I don't feel it is too hard to check for DEST_KEY's non-existence and making reasonable initial bounds (0 for sum, first sample in SOURCE_KEY for min/max/avg).

Danni Moiseyev · Answer 5 · Tue Mar 14 2017 01:01:29 GMT+0800 (China Standard Time)

AGG_TYPE - You are right, thanks. The 5th type is actually count.

Regarding DEST_KEY, there main reason for that was to make sure the user is aware that he's aggregating a time series into another one.
If the SOURCE_KEY has a maximum retention and the DEST_KEY doesn't he might end up using more memory then he wished for.

Having said that, its quite easy to create the timeseries when calling to TS.CREATE, this is also possible on TS.ADD.

Also, please note that the compactions are built "on-the-go" every time TS.ADD is called, its pretty fast.
I haven't benchmark the whole module yet, but from some stress tests that I did it seems like the impact is very minimal.

Thanks for the feedback!