Start Data Engineering (josephmachado)

josephmachado

Geek Repo

Location:New York

Home Page:https://www.startdataengineering.com/

Twitter:@startdataeng

Github PK Tool:Github PK Tool

Start Data Engineering's repositories

beginner_de_project

Beginner data engineering project - batch edition

Language:HCLLicense:MITStargazers:405Issues:9Issues:16

data_engineering_project_template

A template repository to create a data project with IAC, CI/CD, Data migrations, & testing

Language:HCLLicense:MITStargazers:177Issues:6Issues:7

data_engineering_best_practices

Sample project to demonstrate data engineering best practices

simple_dbt_project

Code for dbt tutorial

beginner_de_project_stream

Simple stream processing pipeline

efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

Language:PythonStargazers:66Issues:0Issues:0

bitcoinMonitor

Near real time ETL to populate a dashboard.

online_store

End to end data engineering project

Language:PythonLicense:MITStargazers:45Issues:3Issues:3

analytical_dp_with_sql

Code for my "Efficient Data Processing in SQL" book.

socialetl

Project for "Data pipeline design patterns" blog.

Language:PythonStargazers:36Issues:3Issues:0

spark_submit_airflow

Simple repo to demonstrate how to submit a spark job to EMR from Airflow

Language:PythonLicense:Apache-2.0Stargazers:26Issues:2Issues:0

local_dev

Local development environment for python data projects, with Docker

Language:PythonStargazers:21Issues:0Issues:0

docker_for_data_engineers

Code for blog at: https://www.startdataengineering.com/post/docker-for-de/

Language:CStargazers:19Issues:0Issues:0

e2e_datapipeline_test

Example repo to create end to end tests for data pipeline.

Language:PythonStargazers:17Issues:2Issues:0

change_data_capture

Repo for CDC with debezium blog post

Language:PythonStargazers:15Issues:2Issues:0

data_engineering_best_practices_log

Code to demonstrate data engineering metadata & logging best practices

Language:PythonStargazers:13Issues:2Issues:0
Language:PythonStargazers:13Issues:0Issues:0

data_test_ci

Repository showing how to automate data testing as part of CI

Language:PythonStargazers:9Issues:0Issues:0

dbt_development

Repo to explain development, CI/CD cycle in dbt

Stargazers:7Issues:0Issues:0

josephmachado

Profile readme

Stargazers:6Issues:0Issues:0

unit_test_dbt

unit test example in DBT

Language:ShellStargazers:6Issues:2Issues:0

idempotent-data-pipeline

Making data pipelines idempotent

Language:PythonLicense:MITStargazers:5Issues:0Issues:0

trigger_spark_with_lambda

Simple example showing how to trigger a spark job with AWS Lambda

Language:ShellStargazers:4Issues:0Issues:0

docker-trino-cluster

Multiple node presto cluster on docker container

License:Apache-2.0Stargazers:3Issues:0Issues:0

spark_submit_airflow-

Simple repo to demonstrate how to submit a spark job

Stargazers:3Issues:0Issues:0
Language:DockerfileStargazers:2Issues:0Issues:0

sde_superset_demo

Apache Superset Demp

Stargazers:2Issues:0Issues:0

files

public file hosting

Language:ShellLicense:Apache-2.0Stargazers:1Issues:2Issues:0
Language:MustacheLicense:Apache-2.0Stargazers:1Issues:1Issues:0

soho

Minimalist Hugo theme based on Hyde

Language:CSSLicense:MITStargazers:1Issues:1Issues:0