Macro for aggregating per-second rates - need to simplified SQL query which allows get rates with dimensions but show only one time series with sum/avg
oplehto opened this issue · comments
One common pattern in our metrics is that we want to calculate the rate of a large number of counters and then aggregate those series, typically using a sum. A good example of this would be the total transmit bandwidth for all interfaces grouped by datacenter.
There may be hundreds of series here so we really need to push down the query to ClickHouse so doing workarounds with transforms would not really help. It is doable but
I've included below the InfluxDB and ClickHouse versions of the raw query. Having a macro would simplify the CH version considerably. I was thinking this could be a $perSecond* variant with 2 extra parameters the aggregation function (typically sum) and the columns to aggregate over.
InfluxDB:
SELECT sum(x) FROM (SELECT non_negative_derivative(max("PortXmitData_bits"), 1s) AS x FROM "netstats" WHERE ("datacenter" =~ /^$datacenter$/) AND env = 'prod' AND $timeFilter GROUP BY time($__interval),host,interface) GROUP BY time($__interval) fill(null)
ClickHouse:
SELECT
t,
datacenter,
groupArray((datacenter, max_0_Rate_aggregated)) AS groupArr
FROM
(
SELECT
t,
datacenter,
sum(max_0_Rate) AS max_0_Rate_aggregated
FROM (
SELECT
t,
datacenter,
perSecondColumns,
if(runningDifference(max_0) < 0, nan, runningDifference(max_0) / runningDifference(t / 1000)) AS max_0_Rate
FROM
(
SELECT
$timeSeries as t,
concat(host, interface) AS perSecondColumns,
max(PortXmitData_bits) AS max_0,
datacenter
FROM default.netstats
WHERE
$timeFilter
AND datacenter in ($datacenter) AND env = 'prod'
GROUP BY
t,
datacenter,
perSecondColumns
ORDER BY
perSecondColumns,
datacenter,
t
)
)
GROUP BY datacenter, t
ORDER BY datacenter, t
)
GROUP BY datacenter, t
ORDER BY datacenter, t
Why $perSecondColumns
is not applicable for you?
Can I do this with $perSecondColumns? I may have missed something.
I have a counter that has the identifying columns datacenter
, host
and interface
. So I want to do the following:
- Rate of each counter (so GROUP BY all 3 columns)
- Sum of the rates grouped by
datacenter
The first part I can do with $perSecondColumn
resulting in 1 graph per counter but how do I do the second level aggregation to get the sum of the counters?
How many time series
do you need?
Is only one series per datacenter or one per datacenter plus one per datacenter+interface?
In this particular case the time series would be per datacenter but I have queries that I'm planning to migrate to ClickHouse where we need the top-level GROUP BY based on multiple columns.
maybe GROUP BY ... WITH ROLLUP ..
will good solution for your use case?
https://clickhouse.com/docs/en/sql-reference/statements/select/group-by/#with-rollup-modifier
main PAIN
we need to simplify SQL query in ClickHouse or create macros like $aggregatedRate
which allow get analog of prometheus rate/increase
function with dimension
but show only one time series in the result with sum/avg/min/max results of internal rate