dbt-labs / snowflake-resource-monitoring

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Welcom to the Snowflake Resource Monitoring Project!

NOTE: This is Snowflake-only at present (with a thought on how to extend to other warehouses :) )

Background

Monitoring data warehouse utilization levels becomes a key consideration as analytics organizations mature and grow. In dbt, the snowflake spend package has gained some traction as an approach for doing this. Snowflake spend is great for analyzing workloads according to database, schema, user and warehouse. However, to really identify potential bottlenecks and areas for improvement, it's useful to go deeper by understanding the trends in model run times, configuration, user behavior in development vs. production environments, etc.

Setup

Add the macro code from config/set_query_tag.sql to dbt Snowflake projects you want to monitor. Doing this will override the default set_query_tag() macro that ships with dbt. This version of the package uses some dbt Cloud specific environment variables:

  • DBT_CLOUD_JOB_ID
  • DBT_CLOUD_RUN_ID
  • DBT_CLOUD_RUN_REASON

Potential Future Directions:

  • Make this approach generic across Data Warehouses by logging the models and test runs in an on-run-end hook + macro.
  • Build additional reporting and pre-canned alerts
  • Build time series anomaly detection on usage data with dbt + Python models
  • Semantic layer integration with partners for pre-canned monitoring dashboards
  • Create similar visuals in Administrative reporting within dbt Cloud?

About