erika-e / dbt-tips

Collection of dbt Tips and Tricks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dbt-tips

Collection of dbt tips and tricks. Includes links to the dbt slack community which you can join here. See something that needs to be changed? See contributing guidelines.

Skip to a Section

dbt Concepts and Practices

New to the dbt Ecosystem? Start Here with Beginner Tutorials

Toolbox

  • dbt-utils : Many multipurpose macros. If you're writing something complex or custom, there's probably a better way using functionality from dbt-utils
  • dbt-completion.bash : autocompletion for the dbt CLI h/t
  • dbt-codegen : macros that generate dbt code to the command line h/t
  • dbt-audit-helper : Zen and the art of data auditing. This package will change your life.
  • pre-commit-dbt : Package of dbt pre-commit hooks that allow you to check quality of dbt project documentation, tests, etc
  • dbt-helper : Utility functions to compare WH to dbt, create schema files, and list dependencies
  • live Jinja Parser : Useful tool for writing complex Jinja, does not include dbt-specific Jinja functions
  • yaml Checker : yaml syntax validator
  • palm and palm-dbt: automate dbt workflows, expedite onboarding, and containerize your dbt project
  • dagrules : A linter to enforce DAG organization conventions in your project h/t
  • countries-states-cities-database: Useful dbt seed file source for geographic data and information h/t

Infrastructure and Deploying dbt

dbt CLI

Run And Test Modified and Downstream Models

dbt run -m state:modified+ && dbt test -m state:modified+ h/t and dbt's caveats

To list modifications since your last local run use dbt ls -m state:modified --state ./target/.

Run Downstream Children from a Specific Node

dbt run -m model_name_here+1 only run the children one layer downstream in the DAG or dbt run -m model_name_here+n where n is the offset

Use Dictionaries as Selectors

dbt run -m staging.airtable.* to run all models in models/staging/airtable h/t or dbt run -m staging/airtable

List All Models Downstream

dbt ls -m mymodel+

Run Models with a Specific Tag and Their Parents

dbt run -m +tag:foo h/t

Run a Specific Snapshot

dbt snapshot --select order_snapshot

Run Multiple Snapshots

dbt snapshot --select model1 model2 ... modeln

Output dbt Logs to stdout Instead of the dbt.log File

dbt --debug run gives something similar to the logs h/t

Run Tests on Sources Only

dbt test --models source:* dbt docs

Run Tests By Materialization

dbt test -m config.materialized:snapshot or dbt test -m config.materialzied:seed not guaranteed to be supported h/t and caveat

Run Tests by Tag After Tagging Models With a Specific Materialization

# dbt_project.yml
...
snapshots:
  +tags: ['snapshot']

To run use dbt test -m tag:snapshot h/t

Run A Macro With Arguments

dbt run-operation my_macro --args '{"myarg1":"arrrgh", "myarg2":"aaaaaaaargh"}'

dbt Keyboard Shortcuts

  • Command + / to block comment / uncomment yml sections
  • Command + / to block comment SQL sections in dbt cloud
  • Command + ENTER to preview data in dbt cloud
  • Use F1 in dbt cloud to bring up a list of dbt cloud keyboard shortcuts h/t

dbt Materializations

Incremental

Snapshots

dbt Test Examples

VS Code

This section is stubby for now -- but I'd welcome contributions for how to make VS Code a more effective editor for dbt work!

Command Line

Create Multiple (Blank) Model Files Following A Pattern

touch {prefix1,prefix2}_model.sql

Sometimes I'll use a spreadsheet to easily generate the list of prefixes, check out the TEXTJOIN() function in google sheets.

Find and Delete Files Matching a Pattern Including in Subdirectories

Find first using find path/to/directories/with*ifneeded -type f -name "example_file_prefix*" to test your search criteria.

When happy, find path/to/directories/with*ifneeded -type f -name "example_file_prefix*" -delete to delete. source

Find Specific Lines in a .yml File and Delete Them

First, test with sed -e '/regex_goes_here/{action goes here}' path_to_file(s)_to_act_on

An example action to delete the matching line and subsequent lines could look like {N;N;d} to delete the line and the line after it. More syntax examples for this use case

When you're happy with your output, add -i '' to the start of your command. This will replace inplace without creating a new file. This sed -i '' -e '/ - name: my_model_prefix[a-z]*/{N;N;N;d;}' models/staging/*.yml will delete the matching line and the 3 lines that follow it in the matching file(s) and line(s).

Shell Config for dbtmrt

function dbtmrt () { dbt run -m $1 && dbt test -m $1} for config dbtmrt model_name to use h/t

Two-Terminal Docs

  1. run dbt docs generate
  2. open a new terminal window to run dbt docs serve
  3. just always have the docs open in the background - re run dbt docs generate in first terminal and refresh if needed h/t

Shell Script Examples for zsh

# Functions
function dbtr() {
   dbt run -m $1 --fail-fast
   say done
}

function dbtrt() {
    dbt run -m $1 && dbt test -m $1
    say done
}

Shell Script to Compile Audit Helper

Note that the below has only been tested on a mac, and requires you to create a SQL file called audit_helper_template in the analysis folder of your dbt project.

function dbtah() {
    # substitute the model name from the argument
    gsed -i "s/model_to_audit/$1/" analysis/audit_helper_template.sql
    # enable the audit_helper_template
    gsed -i 's/enabled = false/enabled = true/' analysis/audit_helper_template.sql
    # compile
    dbt compile -m audit_helper_template
    cat target/compiled/*/analysis/audit_helper_template.sql | awk NF | pbcopy
    # modify the template back to the defaults
    gsed -i 's/enabled = true/enabled = false/' analysis/audit_helper_template.sql
    gsed -i "s/$1/model_to_audit/" analysis/audit_helper_template.sql
    say copy pasta
}
-- The model needs to be disabled so it will be ignored while in typical compilation
-- This is required because dbt won't find a node named 'model_to_audit'
-- Substitute the correct production schema and database for your environment
    {{
      config(
        enabled = false,
        )
    }}

{%- set audit_model = "model_to_audit" -%}
{%- set prod_schema = "prod_schema_name" -%}
{%- set dbt_database = "prod_database_name" -%}
{%- set dbt_relation = ref(audit_model) -%}

{%- set old_etl_relation=adapter.get_relation(
      database=dbt_database,
      schema=prod_schema,
      identifier=audit_model
) -%}

{# Generate the audit query - update primary key as needed #}
{{ audit_helper.compare_relations(
    a_relation=old_etl_relation,
    b_relation=dbt_relation,
    primary_key="primary_key_update_before_running"
) }}

To run use dbtah my_target_model. Template heavily inspired by a tool created by Lewis Davies

Git Tips

git mv to reorganize your project without losing history

If you need to move files from one directory to another, cd into the parent of the directories where the files are located, then run git mv child_dir1/model_name child_dir2/with_nested_dirs_if_needed. To make sure it worked, run git status afterwards and you should see your pending changes with renamed: ahead of the file name. Commit as usual, keep rocking with your history!

You can find lots of arguments about whether or not this really keeps the history on stack overflow and elsewhere around the internet. I've found that using this method preserves commit history, but ymmv.

Rename a file without losing history

git mv old_file_name.sql new_file_name.sql for example. Same comment about but-does-it-really from above applies.

Clean up branches and references to remote branches that have been merged

Remove local branches with git branch -d my-merged-branch-name for merged branches or git branch -D unmerged-local-branch for unmerged branches. You can run git remote show origin which will show all branches locally and on the origin. Running git fetch --prune will remove all branches in this listing that look like refs/remotes/origin/branch-name which are local snapshots of remote branches. source

Update your feature branch with new changes from master

You've been working along, and now git says your branch is out of date with the main branch. To fix this, swap over to the main branch with git checkout main and then get any new changes with git pull --rebase. Cool, now you're good locally, so switch over to your branch again with git checkout feature and use git merge main to update your branch.

I made a bunch of changes on a branch and now my changes are failing

Assuming your branch has not been pushed up to the remote repo, you can get around this! First, create a temporary branch to save your work git checkout -b temp_save and then switch back to your develpoment branch git checkout my-feat. Next, reset your branch and your working tree to the last commit you are confident in with git reset --hard 44e447a1. What this will do is roll your branch and your working tree all the way back to the commit 44e447a1. Next, you can apply the commits from your saved branch to the reset branch to 'walk forward' in time one by one until you find the problem that was introduced. To apply a commit from one branch to another, use git cherry-pick 1b132878. This will create a totally new commit on your branch with the changes from the commit 1b132878. It's not a good idea to do this if you've already pushed your branch, since you're then rewriting history and potentially others' commits.

Jinja

A live parser is an incredibly useful tool while you're learning and writing jinja. I like this one

Jinja Delimiters

  • A statement looks like {% ... %}
  • An expression looks like {{ ... }}
  • A comment looks like {# ... #} h/t
  • To escape a sequence you can use {{ '{{ my_escaped_var}}'~
  • To escape longer blocks of code you can use {% raw %} {% endraw %} Jinja2 docs

Comments that Don't Show in Compiled SQL

{# You won't see me #} vs --I will show up all day vs {# /* I won't show up AND your linter won't freak out /* #} dbt discourse

Declaring Variables

{% set my_list = ['one','two',three'] %} and accessing items from lists {{ my_list[1] }} will return two

{% set my_dict = {
    'key1': 'value1',
    'key2': 'value2'
} %}

A general best practice is to do this at the top of your file.

If Else Blocks

{% set condition = True %}

{% if condition %}
what to execute when true

{% else %}
what to execute otherwise

{% endif %}

For Loops

{% set my_list = ['one','two',three'] %}

{% for item in my_list %}

{{item}}

{% endfor %}

dbt Macros and dbt Specific Jinja

When to Use if execute in Macros

When you have queries that need to execute during the parse phase!

Macros Calling Other Macros

Example of a Macro Calling Another Macro

For Loops for Select Statements

For loops for select statements need special handling to prevent a comma after the last item in the select. The hyphens are used to trim the whitespace for better looking compiled SQL.

SELECT
{% for item in list_of_items -%}
{{item}}
{%- if not loop.last -%}
,
{%- endif %}
{% endfor -%}

Macros with Parameters

{% macro my_macro(required_parameter, optional_parameter = default_value) %}
...
{% endmacro %}

adapter - a dbt Jinja Function

adapter performs many useful functions and adapts them to the specific database context you're using

About

Collection of dbt Tips and Tricks

License:GNU General Public License v3.0