There are 7 repositories under aws-redshift topic.
Personal Data Engineering Projects
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
Redshift Python Connector. It supports Python Database API Specification v2.0.
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
Clickstream Analytics on AWS source code
Udacity Data Engineering Nanodegree Program
:arrows_counterclockwise: :running: EtLT of my own Strava data using the Strava API, MySQL, Python, S3, Redshift, and Airflow
Project was based on an interest in Data Engineering, ETL pipeline. It also provided a good opportunity to develop skills and experience in a range of tools. As such, project is more complex than required, utilising dbt, airflow, docker and cloud based storage.
An example system that captures a large stream of product usage data, or events, and provides both real-time data visualization and SQL-based data analytics.
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
spring boot data jpa integration with aws redshift sample
This project provides valuable customer sentiment insights for Zomato by tracking and analyzing tweets related to their brand and services.
Use aws-emr and aws-redshift to analyse dataset of adult census of USA
Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
rdapp - Redshift Data API Postgres Proxy
A simple command-line tool to copy tables from Amazon Redshift to Amazon RDS (PostgreSQL).
Data Warehouse with AWS Redshift and Visualizing data using Power BI
The goal of this repository is to provide good and clear examples of Amazon CLI commands together with Amazon CDK to easily create any AWS services and resources
Zero-ETL integrations - Enable near real-time analytics on petabytes of transactional data
Smart City Realtime Data Engineering Project
Completed Udacity's data engineering nano degree. Went through a series of exercises and projects to learn and practice the trendy big data management tools.
Project 3 - Data Engineering Nanodegree
Redshift script to create a MANIFEST file recursively
Project 5 - Data Engineering Nanodegree
building etl pipelines to migrate music json data/ metadata files (semi-structured data) into a relational database stored in AWS Redshift cluster
A quick example of how to load data from Amazon S3 into Amazon Redshift using Redshift's COPY command through Slick
This project designs and implements an ETL pipeline using Apache Airflow (Docker Compose) to ingest, process, and store retail data. AWS S3 acts as the data lake, AWS Redshift as the data warehouse, and Looker Studio for visualization. [Data Engineer]
This project is a real-time data pipeline designed for ingesting, processing, and storing telecom call records. It integrates Apache Kafka, Apache Spark Streaming, and AWS Redshift to handle large volumes of streaming data in near real-time. The pipeline is containerized with Docker Compose, enabling easy deployment, scalability, and modularity.
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.
Banking Data Warehouse Pipeline
Flight Tragedy Analysis is a comprehensive data analysis project focused on examining aviation accidents and incidents from 1905 to 2009. This project provides users with valuable insights into historical plane crashes and their associated data.