[Feature Request] Allow for percentage of rows to be null
rlh1994 opened this issue · comments
Is your feature request related to a problem? Please describe.
We have a column that most of the time should not be null, but we allow some tolerance in this due to the way the data is sourced, currently there is no test that allows for a proportion of the records in a column to be null, it's all or nothing.
Describe the solution you'd like
A test (or option in an existing test) that calculated the proportion of (not) null records and compares it against some specified tolerance.
Describe alternatives you've considered
Creating a custom test or not testing at all.
Additional context
Hi @clausherther I'm happy to work on this feature.
@danhphan that'd be amazing, thanks! 👏 Let me know if I can help with anything. I think we already have a couple of tests that implemented some sort of tolerance level.
Yes, let me look into the code base and its tests in more details. Thank you!
This is an amazing feature! Any updates?
@rlh1994 you can set tolerances for any test in terms of the absolute number of failing records:
- not_null:
- config:
- error_if: ">1000"
- warn_if: ">500"
But it would be a nice enhancement if you could specify it as a proportion rather than an absolute number...
dbt-utils
has this feature
https://github.com/dbt-labs/dbt-utils/tree/1.1.1/#not_null_proportion-source
- dbt_utils.not_null_proportion:
at_least: 0.99