Marina Pereira's repositories
data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
Grokking-System-Design
Systems design is the process of defining the architecture, modules, interfaces, and data for a system to satisfy specified requirements. Systems design could be seen as the application of systems theory to product development.
little-book-of-pipelines
This repository goes over how to handle massive variety in data engineering
microbatch-hourly-deduped-tutorial
This design is how you can reduce the daily data latency dramatically by deduping your data both hourly and across hours using GROUP BY and FULL OUTER JOIN
spark-scala-examples
This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language
erigon
Ethereum implementation on the efficiency frontier
Data-Engineering-HowTo
A list of useful resources to learn Data Engineering from scratch
awesome-data-engineering
A curated list of data engineering tools for software developers
Cookbook
The Data Engineering Cookbook
cumulative-table-design
This repository helps teach people how to correctly define and create cumulative tables!
Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
airflow
Apache Airflow
canadapandawebfe
Front Web Panda
canadapanda
Middle Panda