[Feature Request] Support for large deviation tests for partitioned bigquery tables
praveen-prashant opened this issue · comments
Is your feature request related to a problem? Please describe.
We use weekly partitioned bigquery tables for our mart layer. We use dbt to create and update them. We often need to ensure that table row counts do not vary too much week by week, but there is no inbuilt test currently in dbt-expectations that allows to check this
Describe the solution you'd like
A new test that checks table row count deviations of partitions over a threshold. Something like below that throws a warning if counts deviate by more than 20% for a table my_mart_table
when compared to the directly preceding partitions
version: 2
models:
- name: my_mart_table
tests:
- not_large_deviation_count:
severity: warn
threshold: 0.2
partition_column: partition_date
Describe alternatives you've considered
We have implemented a custom test like below:
{% macro test_not_large_deviation_count(model, threshold, partition_column='partition_date', row_condition='1=1') %}
WITH counts AS (
SELECT
{{ partition_column }} AS partition_date,
COUNT(*) AS observation_count,
FROM {{ model }}
WHERE {{ row_condition }}
GROUP BY 1
),
compare AS (
SELECT
partition_date,
observation_count AS current_value,
LEAD (observation_count) OVER (ORDER BY partition_date DESC) AS preceding_value
FROM counts
)
SELECT
*
FROM compare
WHERE ABS(SAFE_DIVIDE(current_value - preceding_value, preceding_value)) > {{ threshold }}
{% endmacro %}
Additional context
It would be nice to have other column level comparison based tests to check for partition-on-partition deviations in values like max
, min
, avg
etc for columns in a table just like row counts