Start Data Engineering's repositories
beginner_de_project
Beginner data engineering project - batch edition
data_engineering_project_template
A template repository to create a data project with IAC, CI/CD, Data migrations, & testing
data_engineering_best_practices
Sample project to demonstrate data engineering best practices
simple_dbt_project
Code for dbt tutorial
beginner_de_project_stream
Simple stream processing pipeline
efficient_data_processing_spark
Code for "Efficient Data Processing in Spark" Course
bitcoinMonitor
Near real time ETL to populate a dashboard.
online_store
End to end data engineering project
analytical_dp_with_sql
Code for my "Efficient Data Processing in SQL" book.
spark_submit_airflow
Simple repo to demonstrate how to submit a spark job to EMR from Airflow
docker_for_data_engineers
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
e2e_datapipeline_test
Example repo to create end to end tests for data pipeline.
change_data_capture
Repo for CDC with debezium blog post
data_engineering_best_practices_log
Code to demonstrate data engineering metadata & logging best practices
data_test_ci
Repository showing how to automate data testing as part of CI
dbt_development
Repo to explain development, CI/CD cycle in dbt
josephmachado
Profile readme
unit_test_dbt
unit test example in DBT
idempotent-data-pipeline
Making data pipelines idempotent
trigger_spark_with_lambda
Simple example showing how to trigger a spark job with AWS Lambda
docker-trino-cluster
Multiple node presto cluster on docker container
spark_submit_airflow-
Simple repo to demonstrate how to submit a spark job
sde_superset_demo
Apache Superset Demp