Lupercio421 / data_pipelines_pocket_reference

Development work for the DDPR book by James Densmore.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Pipelines Pocket Reference

This repo contains the files for my development work for DPPR by James Densmore.

Reflections

Understanding AWS Services

While working through the material, I realized the significance of being well-versed in AWS. This is especially important when dealing with services like RDS (Relational Database Service) and Redshift. It's crucial to be mindful of potential costs, and I discovered that opting for the free tier can save you from unexpected expenses. Additionally, configuring security groups within a VPC (Virtual Private Cloud) cluster might seem challenging at first, but with perseverance, I learned to navigate and modify inbound rules effectively.

Contextualizing Source Code

James Densmore provides source code examples throughout the book, and it's important to approach them with a discerning eye. While these examples are immensely valuable, I encountered some syntactic errors, particularly around Chapter 10's 'Measuring and Monitoring Pipeline Performance'. This taught me the importance of closely examining and debugging the provided code. For instance, troubleshooting a CSV ingestion into an S3 bucket helped me identify where an additional set of quotation marks was needed. Additionally, I discovered the nuances between Windows file paths and Linux file paths when working with WSL (Windows Subsystem for Linux).

Navigating the Airflow Setup

One of the challenges I faced was setting up Airflow, a platform for orchestrating complex data workflows. Getting my Python virtual environment to interact seamlessly with the Airflow database on a Postgres DB hosted on RDS, was a trial-and-error process. In the future, I plan to explore setting up Airflow within a Docker container for a smoother experience.

Insights from the 'Transforming Data' Chapter

The chapter on transforming data provided enlightening lessons on querying tables using complex join and where clauses. These insights are invaluable for anyone working with data transformation, and I found the explanations and data modeling examples provided to be especially helpful.

Helpful Links

About

Development work for the DDPR book by James Densmore.


Languages

Language:Python 98.7%Language:PLpgSQL 1.3%