Soumil Nitin Shah's repositories
StarRocks-Hudi-Minio
StarRocks+Hudi+Minio
hudi-minio-starrpcks-superset
hudi-minio-starrpcks-superset
apache-hudi-delta-streamer-labs
apache hudi delta streamer labs
Datalake-to-Microservices-Apache-Hudi-FastAPI-Spark
From Datalake to Microservices: Unleashing the Power of Apache Hudi's Record Level Index with FastAPI and Spark Connect
Dynamic-Hudi-Postgres-Ingestion
Dynamic Hudi Delta Streamer Jobs with JDBC Puller for PostgreSQL Tables, Bringing All Tables into Hudi and Running Jobs in Parallel
HUDI-Spark-DBT-Glue-Hive-Metastore-Run-Locally-
HUDI + Spark+ DBT + Glue Hive Metastore Run Locally
Apache-Hudi-Table-Services-Hands-on-labs
pache Hudi Table Services | Hands on labs
aws-hudi-delta-iceberg-interoperability
aws-hudi-delta-iceberg-interoperability
Get-Started-with-Hudi-CLI-Locally-Using-Docker-in-Minutes-and-Connect-to-Your-S3-Data-
Get Started with Hudi CLI Locally Using Docker in Minutes and Connect to Your S3 Data
glue-dot-interactive-session-template
glue-dot-interactive-session-template
hudi-and-glue-locally
Apache Hudi and AWS Glue docker compose demo
insomnia-plugin-python-script
Python in Insomnia
Learn-How-to-Integerate-Hudi-Spark-job-with-Airflow-and-MinIO
Learn How to Integerate Hudi Spark job with Airflow and MinIO
one-table-with-deltastreamer
one table-with-deltastreamer
onetable-delta-multimodal-index-builder
onetable-delta-multimodal-index-builder
onetable-deltastreamer-glue
onetable-deltastreamer-glue
openhouse
Open Control Plane for Tables in Data Lakehouse
ruff
An extremely fast Python linter and code formatter, written in Rust.
Simplified-Delta-Streamer-Job-Management-A-Structured-Approach-for-Efficient-Data-Processing
Simplified Delta Streamer Job Management: A Structured Approach for Efficient Data Processing
Simplifying-Big-Data-Setting-Up-Spark-SQL-Hive-Thrift-Server-and-Hudi-with-Beeline-in-Minutes-
Simplifying Big Data: Setting Up Spark SQL, Hive Thrift Server, and Hudi with Beeline in Minutes
sling-cli
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
sling-etl-cli-demo
sling-etl-cli-demo
sling-to-starrocks-demo
sling-to-starrocks-demo
sqlglot
Python SQL Parser and Transpiler
vectordb
A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search.
xtable-with-emr-serverless
stable-with-emr-serverless