This demo project is an attempt to integrate Data Lineage provided by DBT into Google Data Catalog. We will use this repo to set up a small DBT project that handles confidential information which we should protect and label, as well as a small lineage project using mapping tables and transformations
By using the Meta argument and the column naming in the DBT manifest we should then be able to update Google Data Catalog
Additionally, all documentation should be available in the DBT docs.
You need the following tools:
dbt-bigquery
which includesdbt-core
- A GCP account with the BigQuery and Data Catalog APIs enabled
gcloud
CLI installed
We are using the Google Cloud Shell to execute all commands as it will authenticate you automatically.
The project is split into 3 tiers:
- Source tier (e.g. Bronze)
- Gold Tier
- Reporting or Data Mart tier