Altinity / clickhouse-grafana

Clickhouse datasource for grafana

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Macro for aggregating per-second rates - need to simplified SQL query which allows get rates with dimensions but show only one time series with sum/avg

oplehto opened this issue · comments

One common pattern in our metrics is that we want to calculate the rate of a large number of counters and then aggregate those series, typically using a sum. A good example of this would be the total transmit bandwidth for all interfaces grouped by datacenter.

There may be hundreds of series here so we really need to push down the query to ClickHouse so doing workarounds with transforms would not really help. It is doable but

I've included below the InfluxDB and ClickHouse versions of the raw query. Having a macro would simplify the CH version considerably. I was thinking this could be a $perSecond* variant with 2 extra parameters the aggregation function (typically sum) and the columns to aggregate over.

InfluxDB:

SELECT sum(x) FROM (SELECT non_negative_derivative(max("PortXmitData_bits"), 1s) AS x FROM "netstats" WHERE ("datacenter" =~ /^$datacenter$/) AND env = 'prod' AND $timeFilter GROUP BY time($__interval),host,interface) GROUP BY time($__interval) fill(null)

ClickHouse:

SELECT
    t,
    datacenter,
    groupArray((datacenter, max_0_Rate_aggregated)) AS groupArr 
FROM
(
  SELECT 
    t,
    datacenter,
    sum(max_0_Rate) AS max_0_Rate_aggregated
    FROM (
    SELECT
        t,
        datacenter,
        perSecondColumns,
        if(runningDifference(max_0) < 0, nan, runningDifference(max_0) / runningDifference(t / 1000)) AS max_0_Rate
    FROM
(
        SELECT
            $timeSeries as t,
            concat(host, interface) AS perSecondColumns,
            max(PortXmitData_bits) AS max_0,
            datacenter
        FROM default.netstats

        WHERE
            $timeFilter
            AND datacenter in ($datacenter) AND env = 'prod'
        GROUP BY
            t,
            datacenter,
            perSecondColumns
        ORDER BY
            perSecondColumns,
            datacenter,
            t
)
)
GROUP BY datacenter, t
ORDER BY datacenter, t
)
GROUP BY datacenter, t
ORDER BY datacenter, t

Why $perSecondColumns is not applicable for you?

Can I do this with $perSecondColumns? I may have missed something.

I have a counter that has the identifying columns datacenter, host and interface. So I want to do the following:

  1. Rate of each counter (so GROUP BY all 3 columns)
  2. Sum of the rates grouped by datacenter

The first part I can do with $perSecondColumn resulting in 1 graph per counter but how do I do the second level aggregation to get the sum of the counters?

How many time series do you need?
Is only one series per datacenter or one per datacenter plus one per datacenter+interface?

In this particular case the time series would be per datacenter but I have queries that I'm planning to migrate to ClickHouse where we need the top-level GROUP BY based on multiple columns.

maybe GROUP BY ... WITH ROLLUP .. will good solution for your use case?

https://clickhouse.com/docs/en/sql-reference/statements/select/group-by/#with-rollup-modifier

main PAIN

we need to simplify SQL query in ClickHouse or create macros like $aggregatedRate
which allow get analog of prometheus rate/increase function with dimension
but show only one time series in the result with sum/avg/min/max results of internal rate