Dorian Teffo's starred repositories
public-apis
A collective list of free APIs
data-engineering-zoomcamp
Free Data Engineering course!
data-engineering-practice
Data Engineering Practice Problems
ssh-deploy
GitHub Action for deploying code via rsync over ssh. (with NodeJS)
data_engineering_project_template
A template repository to create a data project with IAC, CI/CD, Data migrations, & testing
pypi-duck-flow
end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence
beginner_de_project_stream
Simple stream processing pipeline
bitcoinMonitor
Near real time ETL to populate a dashboard.
online_store
End to end data engineering project
modern-data-platform
End-to-end data platform leveraging the Modern data stack
unitTestPySpark
how to unit test your PySpark code
DataEngineeringProjects
Some example projects for Data Engineers to build, end-to-end.
crypto_api_kafka_airflow_streaming
Get Crypto data from API, stream it to Kafka with Airflow. Write data to MySQL and visualize with Metabase
etl_pipeline_docker_metabase
Data pipeline to build a data warehouse on Postgres
data-engineering-projects
Welcome to my data engineering projects repository! Here you will find a collection of data engineering projects that I have worked on.
DuckdbAndDeltaLake
Learning how to query remote s3 Delta Lake with DuckDB.
vg-sales-glue-spark-terraform
ETL job with AWS Glue
INSERT-UPDATE-DELETE-READ-CRUD-on-Delta-lakes-S3-using-Glue-PySpark-Custom-Jar-Files-Athen
INSERT | UPDATE |DELETE| READ | CRUD |on Delta lakes(S3) using Glue PySpark Custom Jar Files & Athena
ci_cd_lambda
Use docker, terraform and github actions to deploy Lambda code to aws
dbt_python_docker
Use dbt to create a star schema
analytics-project-SQL-PowerBI
Utilized SQL and Power BI to assist an e-commerce products business in gaining insights into their data by presenting key metrics.
designer-website-scraping
Extract designer informations on https://www.dexigner.com/
uber-eats-airflow-spark-glue-athena
Ingest CSV files and load them to S3, upload Spark script to S3, run the Spark code on EMR cluster, which will pull the raw UberEats data from S3, clean the data, and load them back to S3 in the proper schema. All of this orchestrated with Airflow
unit_test_sample
Run simple unit tests