calogica / dbt-expectations

Port(ish) of Great Expectations to dbt test macros

Home Page:https://calogica.github.io/dbt-expectations/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature Request] Support for large deviation tests for partitioned bigquery tables

praveen-prashant opened this issue · comments

Is your feature request related to a problem? Please describe.
We use weekly partitioned bigquery tables for our mart layer. We use dbt to create and update them. We often need to ensure that table row counts do not vary too much week by week, but there is no inbuilt test currently in dbt-expectations that allows to check this

Describe the solution you'd like
A new test that checks table row count deviations of partitions over a threshold. Something like below that throws a warning if counts deviate by more than 20% for a table my_mart_table when compared to the directly preceding partitions

version: 2

models:
  - name: my_mart_table
    tests:
      - not_large_deviation_count:
          severity: warn
          threshold: 0.2
          partition_column: partition_date          

Describe alternatives you've considered
We have implemented a custom test like below:

{% macro test_not_large_deviation_count(model, threshold, partition_column='partition_date',  row_condition='1=1') %}

WITH counts AS (
  SELECT
        {{ partition_column }} AS partition_date,
        COUNT(*) AS observation_count,
  FROM {{ model }}
  WHERE {{ row_condition }}
  GROUP BY 1
),

compare AS (
  SELECT
      partition_date,
      observation_count AS current_value,
      LEAD (observation_count) OVER (ORDER BY partition_date DESC) AS preceding_value
  FROM counts
)

SELECT
      *
FROM compare
WHERE ABS(SAFE_DIVIDE(current_value - preceding_value, preceding_value)) > {{ threshold }}

{% endmacro %}

Additional context
It would be nice to have other column level comparison based tests to check for partition-on-partition deviations in values like max, min, avg etc for columns in a table just like row counts