Welcome to my first dbt project!
Super rough 'How I got here' outline.
Before starting
- Extract and Load some data into BigQuery
- (Maybe needed) Setup billing in Google Cloud (could use local Postgres instead) and attach it to the Google Cloud project you are going to use. This is because some of the DML commands I used required billing to be setup. I setup a billing budget also, to try and avoid a surprise (again could setup local Postgres to avoid a surprise bill).
- (Optional) I setup a new project in GCloud:
hankanalytics
- Create a new dataset in BigQuery. In my case made a US Region dataset and called it
hanka
(for Hank Analytics) - Create a new table in BigQuery. This is where the extracted data will go. My table schema was the same schema as
bigquery-public-data.austin_bikeshare.bikeshare_trips
as that is where I "extracted" "raw" data from. - Extract the data. I got only a sample of the data, because I wanted things later to run fast.
INSERT INTO 'hankanalytics.hanka.austin_bikeshare_trips' SELECT * FROM 'bigquery-public-data.austin_bikeshare.bikeshare_trips' WHERE rand() < 0.01
. Of course change the table you insert INTO to your gcloud project, dataset, and desired table name.
- Setup dbt for local development
- Install dbt for target db. Directions here (except setup for BigQuery, not Postgres):
https://blog.jetbrains.com/big-data-tools/2022/01/25/how-i-started-out-with-dbt/
- Create service account in gcloud. Directions here:
https://medium.com/@ivan_toriya/step-by-step-guide-to-run-dbt-in-production-with-google-cloud-platform-fb1f035f3c7b
- Create a JSON key for the service account and download it. Do not place this key anywhere public.
- Setup dbt profile. Run
dbt init
to do this. Use service account and the JSON key you just downloaded
- Install dbt for target db. Directions here (except setup for BigQuery, not Postgres):
- Configure and run dbt.
- Clone or fork this project.
- Change everywhere
hankanalytics
to your project name. - Change everywhere
hanka
to your dataset name. - If you picked a table other than
austin_bikeshare_trips
change that in the source model. - Change in the
dbt_project.yml
fileprofile:
parameter to match the profile from step 2 above.
Using the starter project
Try running the following commands:
- dbt deps
- dbt run
- dbt test
- dbt docs generate
- dbt docs serve
Resources:
- Learn more about dbt in the docs
- Check out Discourse for commonly asked questions and answers
- Join the chat on Slack for live discussions and support
- Find dbt events near you
- Check out the blog for the latest news on dbt's development and best practices