Soumil Nitin Shah (soumilshah1995)

soumilshah1995

Geek Repo

Company:Lead Data Engineer | AWS & Apache Hudi Expert | Spark & AWS Glue Enthusiast | YouTuber

Location:New York

Home Page:https://soumilshah.com/

Github PK Tool:Github PK Tool

Soumil Nitin Shah's repositories

install-external-python-packages-on-serverless

install external python packages on serverless

Language:PythonLicense:Apache-2.0Stargazers:37Issues:2Issues:1

PythonLambdaDockerECR

PythonLambdaDockerECR

fastapi-python

Learn How to make and deploy Fast api in python using docker

run-aws-glue-locally-docker

run-aws-glue-locally-docker

License:Apache-2.0Stargazers:8Issues:2Issues:0

Unlocking-Incremental-Data-in-PySpark-Extracting-from-JDBC-Sources-without-Debezium-or-AWS-DMS-with

Unlocking Incremental Data in PySpark: Extracting from JDBC Sources without Debezium or AWS DMS with CDC

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4Issues:1Issues:0

Efficient-Data-Ingestion-with-Glue-Concurrency-Using-a-Single-Template-for-Multiple-S3-Tables-into-

Efficient Data Ingestion with Glue Concurrency: Using a Single Template for Multiple S3 Tables into a Transactional Hudi Data Lake

License:Apache-2.0Stargazers:3Issues:1Issues:0

Project-Using-Apache-Hudi-Deltastreamer-and-AWS-DMS-Hands-on-Lab

Project : Using Apache Hudi Deltastreamer and AWS DMS Hands on Labs

License:Apache-2.0Stargazers:3Issues:3Issues:0

Power-your-Down-Stream-Elastic-Search-Stack-From-Apache-Hudi-Transaction-Datalake-with-CDC

Power your Down Stream Elastic Search Stack From Apache Hudi Transaction Datalake with CDC

Language:PythonLicense:Apache-2.0Stargazers:2Issues:0Issues:0

source-to-target-mapping-python

source to target mapping python

Language:PythonLicense:Apache-2.0Stargazers:2Issues:0Issues:0

-Architecture-Powering-Down-Stream-System-with-CDC-from-HUDI-Transactional-Datalake-

Architecture Powering Down Stream System with CDC from HUDI Transactional Datalake

Language:PythonLicense:Apache-2.0Stargazers:1Issues:2Issues:0

aws-glue-studio-and-clickhous-etl-job

AWS Glue Studio and ClickHouse Integration

emr-serverless-labs-cli

emr-serverless-labs-cli

Language:PythonLicense:Apache-2.0Stargazers:1Issues:2Issues:0

How-do-I-read-data-from-Cross-Account-S3-Buckets-and-Build-Hudi-Transactional-Datalake-in-Central-AW

How do I read data from Cross Account S3 Buckets and Build Hudi Transactional Datalake in Central AWS Account

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1Issues:0Issues:0

localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

Language:PythonLicense:Apache-2.0Stargazers:1Issues:1Issues:0

Advantages-of-Metadata-Indexing-and-Asynchronous-Indexing-in-Hudi-Hands-on-Lab

Advantages of Metadata Indexing and Asynchronous Indexing in Hudi Hands on Lab

License:Apache-2.0Stargazers:0Issues:1Issues:0

amazon-emr-cli

A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0
Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0

Bootstrapping-in-Apache-Hudi-on-EMR-Serverless

Bootstrapping in Apache Hudi on EMR Serverless

License:Apache-2.0Stargazers:0Issues:2Issues:0

Change-Data-Capture-in-Apache-Hudi-

Change Data Capture in Apache Hudi hands on lab

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:2Issues:0

ci-cd-serverless-spark-1

Sample CI/CD pipeline for using GitHub Actions with Amazon EMR Serverless Spark.

License:MIT-0Stargazers:0Issues:0Issues:0

Clustering-in-Hudi-hands-on-Labs

Clustering in Hudi hands on Labs

License:Apache-2.0Stargazers:0Issues:2Issues:0

Efficient-Data-Lake-Management-with-Apache-Hudi-Cleaner-Benefits-of-Scheduling-Data-Cleaning

Efficient Data Lake Management with Apache Hudi Cleaner: Benefits of Scheduling Data Cleaning

License:Apache-2.0Stargazers:0Issues:0Issues:0

How-to-Query-Hudi-Tables-in-Incremnetal-Fashion-and-Get-only-New-data-on-AWS-GLue-

How to Query Hudi Tables in Incremnetal Fashion and Get only New data on AWS GLue

Language:PythonLicense:Apache-2.0Stargazers:0Issues:2Issues:0

Incremental-Processing-Pipeline-to-power-Aurora-Postgres-SQL-from-Hudi-Transcational-Datalake-

Incremental Processing Pipeline to power Aurora Postgres SQL from Hudi Transcational Datalake

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

Learn-about-Apache-Hudi-Transformers-with-Hands-on-Lab

Learn about Apache Hudi Transformers with Hands on Lab

License:Apache-2.0Stargazers:0Issues:0Issues:0

Learn-How-to-Interrelate-Apache-Hudi-with-Redshift-Spectrum-Hands-on-Labs

Learn How to Interrelate Apache Hudi with Redshift Spectrum Hands on Labs

License:Apache-2.0Stargazers:0Issues:1Issues:0

Lets-Build-CDC-Pipeline-from-Microsoft-SQL-Server-into-Apache-Hudi-Transactional-Datalake-

Lets Build CDC Pipeline from Microsoft SQL Server into Apache Hudi Transactional Datalake

Language:PythonLicense:Apache-2.0Stargazers:0Issues:2Issues:0

querybook

Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.

Language:TypeScriptLicense:Apache-2.0Stargazers:0Issues:1Issues:0

Running-Apache-Hudi-Delta-Streamer-On-EMR-Serverless-Hands-on-Lab-step-by-step-guide-for-beginners

Running Apache Hudi Delta Streamer On EMR Serverless Hands on Lab step by step guide for beginners

License:Apache-2.0Stargazers:0Issues:2Issues:0

Step-by-Step-Guide-to-Incrementally-Pulling-Data-from-JDBC-with-Python-and-PySpark

Step-by-Step Guide to Incrementally Pulling Data from JDBC with Python and PySpark

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0