calogica / dbt-expectations

Port(ish) of Great Expectations to dbt test macros

Home Page:https://calogica.github.io/dbt-expectations/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature Request] Allow for percentage of rows to be null

rlh1994 opened this issue · comments

Is your feature request related to a problem? Please describe.
We have a column that most of the time should not be null, but we allow some tolerance in this due to the way the data is sourced, currently there is no test that allows for a proportion of the records in a column to be null, it's all or nothing.

Describe the solution you'd like
A test (or option in an existing test) that calculated the proportion of (not) null records and compares it against some specified tolerance.

Describe alternatives you've considered
Creating a custom test or not testing at all.

Additional context

Hi @clausherther I'm happy to work on this feature.

@danhphan that'd be amazing, thanks! 👏 Let me know if I can help with anything. I think we already have a couple of tests that implemented some sort of tolerance level.

Yes, let me look into the code base and its tests in more details. Thank you!

This is an amazing feature! Any updates?

@rlh1994 you can set tolerances for any test in terms of the absolute number of failing records:

- not_null:
  - config:
    - error_if: ">1000"
    - warn_if: ">500"
   

But it would be a nice enhancement if you could specify it as a proportion rather than an absolute number...

dbt-utils has this feature
https://github.com/dbt-labs/dbt-utils/tree/1.1.1/#not_null_proportion-source

              - dbt_utils.not_null_proportion:
                  at_least: 0.99