yashk1 / de-zoomcamp

notes and exercises from the data-engeineering zoomcamp from DataTalksClub

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

de-zoomcamp

This repository contains notes and exercises I made taking the Data Engineer Zoomcamp provided by the Data Talks Club.

Content

Data used: Yellow Taxi Data New York

The data can ce downloaded using: wget https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.csv

Week 1: Introduction & Prerequisites

  • Postgres
    • Load the data into a database
    • Use pgcli to connect to Postgres
  • pgAdmin
    • Use the webinterface to look at the data
  • Docker
    • Getting started with Docker
    • Use Docker to start Postgres
    • Use Docker to start pgAdmin
    • Use both in the same network
  • docker-compose
    • Use one yaml-file to start pgAdmin and Postgres in the same network
  • Introduction to Terraform
  • Introduction to Google Cloud
  • Homework

Week 2: Workflow Orchestration

  • Data Lake
  • Workflow orchestration
  • Introduction to Prefect
  • ETL with GCP & Prefect
    • store data in GCS and Big Query
  • Parametrizing workflows
  • Prefect Cloud and additional resources
  • Homework

Week 3: Data Warehouse

Week 4: Analytics Engineering

Week 5: Batch processing

Week 6: Streaming

Week 7, 8 & 9: Project

About

notes and exercises from the data-engeineering zoomcamp from DataTalksClub


Languages

Language:Jupyter Notebook 94.3%Language:Python 4.4%Language:HCL 1.2%Language:Dockerfile 0.1%