kevinknights29 / Airflow_Retail_Pipeline

This project is inspired in the video: Data Engineer Project: An end-to-end Airflow data pipeline with BigQuery, dbt Soda, and more!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Airflow_Retail_Pipeline

This project is inspired by the video: Data Engineer Project: An end-to-end Airflow data pipeline with BigQuery, Dbt, Soda, and more!

Prerequisites

  • Have Docker installed

    To install check: Docker Dekstop Install

  • Have Astro CLI installed

    If you use brew, you can run: brew install astro

    For other systems, please refer to: Install Astro CLI

  • Have a Soda account

    You can get a 45-day free trial: Soda

  • Have a Google Cloud account

    You can create your account here: Google Cloud

Getting Started

  1. Run astro dev init to create the necessary files for your environment.

  2. Run astro dev start to start the airflow service with docker.

  3. Download dataset from Kaggle - Online Retail

    • Create a folder dataset inside the include directory and add your CSV file there.
  4. Create a Google Cloud Bucket.

    • Create a folder called input
  5. Create a Service Account.

    • Grant access to Cloud Storage as "Storage Admin".

    • Grant access to BigQuery as "BigQuery Admin".

  6. Create a JSON key for the Service Account.

    • Create a folder gcp inside the include directory and add your JSON key there.
  7. Create a connection in the Airflow UI using the path of the JSON key.

About

This project is inspired in the video: Data Engineer Project: An end-to-end Airflow data pipeline with BigQuery, dbt Soda, and more!

License:MIT License


Languages

Language:Python 99.4%Language:Dockerfile 0.6%