influxdata / flux

Flux is a lightweight scripting language for querying databases (like InfluxDB) and working with data. It's part of InfluxDB 1.7 and 2.0, but can be run independently of those.

Home Page:https://influxdata.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

contrib/anaisdg/anomalydetection mad function is incorrect

anussel5559 opened this issue · comments

Currently, the diff_med table is calculated as such:

diff_med =
    diff
        |> median(column: "_value")
        |> map(fn: (r) => ({r with MAD: k * r._value}))
        |> filter(fn: (r) => r.MAD > 0.0)

Which correctly assigns the MAD value to the MAD column in the diff_med table as k * median(abs(x - median(xi))) (the underlying _value column comes from the diff table, which calculated the absolute difference of the individual values and the datasets median.)

however that MAD column is unused in the output calculation:

output =
    join(tables: {diff: diff, diff_med: diff_med}, on: ["_time"], method: "inner")
      |> map(fn: (r) => ({r with _value: r._value_diff / r._value_diff_med}))
      |> map(
          fn: (r) =>
              ({r with level:
                      if r._value >= threshold then
                          "anomaly"
                      else
                          "normal",
              }),
        )

Note: the output table _value column is calculated in the map as: _value_diff / _value_diff_med. The _value column from the diff_med table is NOT the full MAD, it is only the median of the difference or median(abs(x - median(xi))) - it is missing the multiplication by the k constant.

The fix here could be as simple as adjusting the map in the diff_med table to

map(fn: (r) => ({r with _value: k * r._value}))

Which would correctly assign the MAD in to the _value column to be used in the final output calculation.

Reference code

This issue has had no recent activity and will be closed soon.