Soumil Nitin Shah's starred repositories
flink-cdc-connectors
CDC Connectors for Apache FlinkĀ®
docker-hadoop-spark
Multi-container environment with Hadoop, Spark and Hive
airflow-pipeline
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
emr-serverless-samples
Example code for running Spark and Hive jobs on EMR Serverless.
stepfunctions2processing
Configuration with AWS step functions and lambdas which initiates processing from activity state
Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
amazon-emr-cli
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs
dbt-redshift-demo
dbt / Amazon Redshift Demonstration Project
spark-aws-messaging
A custom sink provider for Apache Spark that sends the content of a dataframe to an AWS SQS
ci-cd-serverless-spark
Demo for GitHub Universe 2022
ci-cd-serverless-spark
Sample CI/CD pipeline for using GitHub Actions with Amazon EMR Serverless Spark.
kafka-connect-mysql-s3
Example project of streaming data from mysql database to AWS S3 repository
Event-Driven-S3-Glue-Transactional-Lake
Learn and Develop How to ingest data from S3 into Transactional Data lake through event driven approach using Glue and SQS queue and DLQ
Project-Using-Apache-Hudi-Deltastreamer-and-AWS-DMS-Hands-on-Lab
Project : Using Apache Hudi Deltastreamer and AWS DMS Hands on Labs
An-easy-to-use-Python-utility-class-for-accessing-incremental-data-from-Hudi-Data-Lakes
An easy-to-use Python utility class for accessing incremental data from Hudi Data Lakes
docker_compose_glue4.0
docker_compose_glue3.0
Sending-Weekly-Daily-CSV-Reports-FROM-Hudi-Datalake-to-Customers-via-Email-using-Glue-and-SNS-OR-SES
Sending Weekly /Daily CSV Reports FROM Hudi Datalake to Customers via Email using Glue and SNS OR SES