AxelThevenot / dbt-assertions

Package to assert rows in-line with dbt macros.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dbt-assertions-logo.png

License Static Badge GitHub Release GitHub (Pre-)Release Date

build status GitHub issues GitHub pull requests

Features

βœ… Robust Data Quality Checks

dbt-assertions ensures thorough data quality assessments at the row level, enhancing the reliability of downstream models.

πŸ” Efficient Exception Detection

Granular row-by-row exception detection identifies and flags specific rows that fails assertions, streamlining the resolution process.

πŸ› οΈ Customizable Assertions & Easy Integration

Easy-to-use macros assertions() and assertions_filter() empower users to customize without barriers data quality checks within the model YAML definition, adapting to specific data validation needs.

πŸš€ An Easy Shift from your Actual Workflows

A generic test generic_assertions() to perform dbt tests as usual, testing the package easily without compromising your current workflows. you can test the package with this generic test easily without having to rebuild you table

Content

Install

dbt-assertions currently supports dbt 1.7.x or higher.

Check dbt package hub for the latest installation instructions, or read the docs for more information on installing packages.

Include in packages.yml

packages:
  - package: AxelThevenot/dbt_assertions
    version: [">=1.0.0", "<2.0.0"]
    # <see https://github.com/AxelThevenot/dbt-assertions/releases/latest> for the latest version tag

Supported databases

Database assertions assertions_filter __unique__ __not_null__ generic_assertions
BigQuery (default) βœ… βœ… βœ… βœ… βœ…
Snowflake βœ… βœ… βœ…
(not nested fields)
βœ…
(not nested fields)
βœ…
DuckDB βœ… βœ… βœ…
(not nested fields)
βœ…
(not nested fields)
βœ…
Others ℹ️ ℹ️ ℹ️ ℹ️ ℹ️
  • βœ…: supported
  • βœ”οΈ: supported not tested
  • ❌: not supported
  • ℹ️: Opened to contributions ❀️ (see the integration_tests folder on how to test your adapter)

For latest release, see https://github.com/AxelThevenot/dbt-assertions/releases

Dependencies

This package do not have dependencies.

Variables

The following variable can be defined in your dbt_project.yml file to change the default exceptions column name used by the package.

vars:
    "dbt_assertions:default_column": "exceptions"

Basic Example

Check the basic_example example.

Documentation

Macros

assertions() macro generates a select expression for row-level assertions.

Arguments:

  • column (optional[str]): column to read the assertions from.

This macro parses the schema model YAML to extract row-level assertions; custom assertions, unique, and not-null. It then constructs an array of exceptions for each row based on its assertions results.

By default, it will generate assertions based on your YAML model definition reading configuration for a column named exceptions.

You can call the macro using column argument to change this default column.

SELECT
    *,
    {{ dbt_assertions.assertions(column='warnings') }},
FROM {{ ref('my_model') }}

Note: this macro is made to generate assertions based on you query result. It means it must be generated at the end of the query.

WITH
    [...] -- Other CTEs
    final AS (
        SELECT
            [...]
        FROM {{ ref('my_model') }}
    )

-- After query results
SELECT
    *,
    {{ dbt_assertions.assertions() }},
FROM final

assertions_filter() macro generates an expression to filter rows based on assertions results, generated with the assertions() macro.

Arguments:

  • column (optional[str]): Column to read the exceptions from.
  • exclude_list (optional[list[str]]): Assertions to exclude in the filter.
  • include_list (optional[list[str]]): Assertions to include in the filter.
  • reverse (optional[bool]): returns rows without exception when reverse=false, and rows with exceptions when reverse=true.

By default, each row with exception(s) will be filtered.

SELECT
    *
FROM {{ ref('my_model') }}
WHERE {{ dbt_assertions.assertions_filter() }}

You can change this default behaviour specifying an optional exclude_list or include_list argument (not both).

SELECT
    *
FROM {{ ref('my_model') }}
WHERE {{ dbt_assertions.assertions_filter(exclude_list=['assertions_id']) }}

Tests

Generates a test to get rows based on exceptions.

It will returns the rows without any exception by default. You can change this default behaviour specifying a exclude_list or include_list (not both).

You must defined beforehand the assertions for the model. More on YAML definition for assertions.

Arguments:

  • column (optional[str]): Column to read the exceptions from.
  • exclude_list (optional[list[str]]): Assertions to exclude in the filter.
  • include_list (optional[list[str]]): Assertions to include in the filter.
  • re_assert (optional[bool]): to set to true if your assertion field is not calculated in your table.

Configure the generic test in schema.yml with:

model:
  name: my_model
  tests:
    - dbt_assertions.generic_assertions:
      [column: <column_name>]
      [exclude_list: <list(str_to_filter)>]
      [include_list: <list(str_to_filter)>]
      [re_assert: true | false]

  columns:
    ...

[] represents optional parts. Yes everything is optional but let's see it by examples.

In the basic test example you can easily create your test as follows then run your dbt test command.

models:
  - name: basic_test_example_d_site
    tests:
      - dbt_assertions.generic_assertions:
          column: exceptions
          include_list:
            - site_id_is_not_null
          # `re_assert: true` to use only if your assertion's column
          # is not computed and saved in your table.
          re_assert: true

    columns:
      ...

Model definition

Yaml general definition

The assertions definition must be created under a column definition of your model and respects the following.

assertions:
  [__unique__: <unique_expression>]
  [__not_null__: __unique__ | <not_null_expression>]

  [<custom_assertion_id>:
    description: [<string>]
    expression: <string>
    null_as_exception: [<bool>]]
  ...

[] represents optional parts. Yes, everything is optional but let's see it by examples.

Custom assertions

Custom assertions are the basics assertions.

The package is made to support every assertions as long as it is supported in a SELECT statement of your underlying database. So you can do a lot of things.

It is represented as key values. Keys are the ID of the assertions.

Each assertions is defined by at least an expression which will be rendered to be evaluated as your test.

description and null_as_exception are optional.

assertions:
  unique:
    description: "Row must be unique."
    expression: "1 = COUNT(1) OVER(PARTITION by my_id)"

  site_creation_date_is_past:
    description: "Site must be created in the past."
    expression: "site_creation_date <= CURRENT_DATE()"

  site_type_pick_list:
    description: "The site type be must in its known picklist."
    expression: |
        site_type IN (
            'store',
            'ecommerce',
            'drive',
            'pickup'
        )

null_as_exception

null_as_exception is an optional configuration for your assertion. Default to false it is the return result if your expression is evaluated to NULL.

Default behaviour is set to false because one assertion must evaluate on thing. Prefer using the __not_null_ helper instead.

In our previous, if we want to also avoid NULL site types.

assertions:
  ...

  site_type_pick_list:
    description: "The site type be must in its known picklist."
    expression: |
        site_type IN (
            'store',
            'ecommerce',
            'drive',
            'pickup'
        )
    null_as_exception: true

__unique__ helper

As guaranteeing uniqueness of rows is a concern in most of the use cases, the __unique__ helper is here to avoid writing complex and repetitive queries.


assertions:
  unique:
    description: "Row must be unique."
    expression: "1 = COUNT(1) OVER(PARTITION by key_1, key_2)"

The above configuration is the same as writing

assertions:
  __unique__:
    - key_1
    - key_2

You can also verify unique keys for nested/repeated structure. It will generate:

  • One assertion for the 0-depth guaranteeing uniqueness across the rows.
  • One assertion for each repeated field guaranteeing uniqueness within the row.

The following example will generate the assertions:

  • unique: Row must be unique over the unique keys.
  • nested_1__unique: Items must be unique within nested_1 in the row.
  • nested_1.nested_2__unique: Items must be unique within nested_1.nested_2 in the row.
assertions:
  __unique__:
    - key_1
    - key_2
    - nested_1:
        - key_3
        - key_4
        - nested_2:
            - key_5
            - key_6

__not_null__ helper

As guaranteeing not null values is also concern in most of the use cases, the __not_null__ helper is here to avoid writing complex and repetitive queries.


assertions:
  key_1__not_null:
    description: "key_1 is not null."
    expression: "key_1 IS NOT NULL"

  key_2__not_null:
    description: "key_2 is not null."
    expression: "key_2 IS NOT NULL"

The above configuration is the same as writing

assertions:
  __unique__:
    - key_1
    - key_2

  __not_null__:
    - key_1
    - key_2

And as the two helpers are often linked, you can rewrite the assertions as follows, which is also the same.

assertions:
  __unique__:
    - key_1
    - key_2

  __not_null__: __unique__

You can also verify unique keys for nested/repeated structure. It will generate:

  • One assertion for each column of the 0-depth guaranteeing not null.
  • One assertion for each column under the repeated field guaranteeing all the values are not null within the row.

The following example will generate the assertions:

  • key_1__not_null: key_1 is not null.
  • key_2__not_null: key_2 is not null.
  • nested_1.key_3__not_null: nested_1.key_3 are not null.
assertions:
  __not_null__:
    - key_1
    - key_2
    - nested_1:
        - key_3

Custom column name

If exceptions column is not a naming convention you like, you can still opt for a column name you choose and the macro will still work with the from_colum argument.

You can also play with multiple columns.

model:
  name: my_model
  columns:
    ...
    - errors:
      assertions:
        __unique__:
            - key_1
            - key_2

        __not_null__: __unique__

    - warns:
      assertions:
        site_creation_date_is_past:
          description: "Site must be created in the past."
          expression: "site_creation_date <= CURRENT_DATE()"

And in your model query.

WITH final AS
    (
        SELECT ...
    )
SELECT
    *,
    {{ dbt_assertions.assertions(column='errors') }},
    {{ dbt_assertions.assertions(column='warns') }}
FROM {{ ref('my_model') }}

Contribution

If you want to contribute, please open a Pull Request or an Issue on this repo. Feel free to reach me Linkedin.

Acknowledgments

Special thank to

Contact

If you have any question, please open a new Issue or feel free to reach out to Linkedin


Happy coding with dbt-assertions!

About

Package to assert rows in-line with dbt macros.

License:Apache License 2.0