mitesh91 / airflow-brand

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Brandless Data Engineering Take Home Exercise

Setup:

  1. Scheduler :Apache Airflow
  2. Database : Postgres 10.0(dev)/ Amazon Redshift(prod)
  3. Intermediate file/object storage : Local file system and S3
  4. Languages used : Python, SQL

In this exercise we have setup an Airflow scheduler which queries Edemam Recipe Search API and pulls data for the Pasta recipes and also the health and diet labels associated with each recipes in our recipes table. We have scheduled daily incremental batch jobs which loads data into 3 tables

  1. recipe_health_lables
  2. recipe_diet_lables &
  3. recipes

We have separate Python tasks to load into each of the three tables in the form of a DAG.

About


Languages

Language:Python 63.5%Language:Jupyter Notebook 36.5%