sathya-reddy-m's repositories
cape-dataframes
Privacy transformations on Spark and Pandas dataframes backed by a simple policy language.
fig
Public issue tracker for Fig.
ide-best-practices
Best practices for working with Databricks from an IDE
metorikku
A simplified, lightweight ETL Framework based on Apache Spark
waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
smart-data-lake
Framework to quickly build and maintain Smart Data Lakes
openwhisk
Apache OpenWhisk is an open source serverless cloud platform
embedded-kafka
A library that provides an in-memory Kafka instance to run your tests against.
spark-extensions
Modified Spark code for SmartDataLakeBuilder
kafka
Mirror of Apache Kafka
Data-Engineering-Projects
Personal Data Engineering Projects
pynecone
🕸 Web apps in pure Python 🐍
hudi
Upserts, Deletes And Incremental Processing on Big Data.
dinky
Dinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.
optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
mack
Delta Lake helper methods in Python
ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.
wtfjs
🤪 A list of funny and tricky JavaScript examples
corp
Assets related to the operation of Fishtown Analytics.
python-deequ
Python API for Deequ
aws-glue-libs
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
metaflow
:rocket: Build and manage real-life data science projects with ease!
wtfpython
What the f*ck Python? 😱
awesome-spark
A curated list of awesome Apache Spark packages and resources.
co
Style guides and conventions
sdl-examples
Examples for Smart Data Lake
sparklint
A tool for monitoring and tuning Spark jobs for efficiency.