Soumil Nitin Shah (soumilshah1995)

soumilshah1995

Geek Repo

Company:Lead Data Engineer | AWS & Apache Hudi Expert | Spark & AWS Glue Enthusiast | YouTuber

Location:New York

Home Page:https://soumilshah.com/

Github PK Tool:Github PK Tool

Soumil Nitin Shah's starred repositories

localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

Language:PythonLicense:Apache-2.0Stargazers:19687Issues:166Issues:535

presto

The official home of the Presto distributed SQL query engine for big data

Language:JavaLicense:Apache-2.0Stargazers:15818Issues:860Issues:6500

hudi

Upserts, Deletes And Incremental Processing on Big Data.

Language:JavaLicense:Apache-2.0Stargazers:5256Issues:1170Issues:3131

flink-cdc-connectors

CDC Connectors for Apache FlinkĀ®

Language:JavaLicense:Apache-2.0Stargazers:5002Issues:116Issues:1631

winutils

winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows

querybook

Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.

Language:TypeScriptLicense:Apache-2.0Stargazers:1841Issues:33Issues:219

onetable

OneTable is an omni-directional converter for table formats that facilitates interoperability across data processing systems and query engines.

Language:JavaLicense:Apache-2.0Stargazers:562Issues:18Issues:144

scanns

A scalable nearest neighbor search library in Apache Spark

Language:ScalaLicense:NOASSERTIONStargazers:257Issues:15Issues:9

docker-hadoop-spark

Multi-container environment with Hadoop, Spark and Hive

Language:ShellStargazers:186Issues:9Issues:0

airflow-pipeline

An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR

Language:PythonLicense:Apache-2.0Stargazers:171Issues:24Issues:13

emr-serverless-samples

Example code for running Spark and Hive jobs on EMR Serverless.

Language:PythonLicense:MIT-0Stargazers:145Issues:8Issues:23

stepfunctions2processing

Configuration with AWS step functions and lambdas which initiates processing from activity state

Language:PythonLicense:MITStargazers:121Issues:6Issues:4

Real-time-Data-Warehouse

Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi

amazon-emr-cli

A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs

Language:PythonLicense:Apache-2.0Stargazers:34Issues:5Issues:19

dbt-redshift-demo

dbt / Amazon Redshift Demonstration Project

Language:ShellLicense:Apache-2.0Stargazers:30Issues:3Issues:0

localemr

Local AWS EMR - A local service that imitates AWS EMR

Language:PythonLicense:Apache-2.0Stargazers:23Issues:5Issues:0

spark-aws-messaging

A custom sink provider for Apache Spark that sends the content of a dataframe to an AWS SQS

Language:JavaLicense:MITStargazers:18Issues:2Issues:4

ci-cd-serverless-spark

Demo for GitHub Universe 2022

Language:PythonStargazers:12Issues:2Issues:0
Language:PythonLicense:NOASSERTIONStargazers:7Issues:0Issues:0

ci-cd-serverless-spark

Sample CI/CD pipeline for using GitHub Actions with Amazon EMR Serverless Spark.

Language:PythonLicense:MIT-0Stargazers:5Issues:6Issues:0

kafka-connect-mysql-s3

Example project of streaming data from mysql database to AWS S3 repository

Language:DockerfileStargazers:4Issues:0Issues:0

Event-Driven-S3-Glue-Transactional-Lake

Learn and Develop How to ingest data from S3 into Transactional Data lake through event driven approach using Glue and SQS queue and DLQ

License:Apache-2.0Stargazers:4Issues:2Issues:0

Project-Using-Apache-Hudi-Deltastreamer-and-AWS-DMS-Hands-on-Lab

Project : Using Apache Hudi Deltastreamer and AWS DMS Hands on Labs

License:Apache-2.0Stargazers:3Issues:3Issues:0

An-easy-to-use-Python-utility-class-for-accessing-incremental-data-from-Hudi-Data-Lakes

An easy-to-use Python utility class for accessing incremental data from Hudi Data Lakes

Language:PythonLicense:Apache-2.0Stargazers:2Issues:2Issues:0

docker_compose_glue4.0

docker_compose_glue3.0

Language:PythonStargazers:1Issues:0Issues:0

hudi

Upserts, Deletes And Incremental Processing on Big Data.

License:Apache-2.0Stargazers:1Issues:0Issues:0

Sending-Weekly-Daily-CSV-Reports-FROM-Hudi-Datalake-to-Customers-via-Email-using-Glue-and-SNS-OR-SES

Sending Weekly /Daily CSV Reports FROM Hudi Datalake to Customers via Email using Glue and SNS OR SES

Language:PythonLicense:Apache-2.0Stargazers:1Issues:2Issues:0