sathya-reddy-m's repositories

cape-dataframes

Privacy transformations on Spark and Pandas dataframes backed by a simple policy language.

License:Apache-2.0Stargazers:1Issues:0Issues:0

fig

Public issue tracker for Fig.

License:MITStargazers:0Issues:0Issues:0

sparkly

Helpers & syntactic sugar for PySpark.

License:Apache-2.0Stargazers:1Issues:0Issues:0

ide-best-practices

Best practices for working with Databricks from an IDE

License:Apache-2.0Stargazers:0Issues:0Issues:0

metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Language:ScalaLicense:MITStargazers:0Issues:0Issues:0

waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

Language:ScalaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

smart-data-lake

Framework to quickly build and maintain Smart Data Lakes

Language:ScalaLicense:GPL-3.0Stargazers:0Issues:0Issues:0

openwhisk

Apache OpenWhisk is an open source serverless cloud platform

License:Apache-2.0Stargazers:0Issues:0Issues:0

embedded-kafka

A library that provides an in-memory Kafka instance to run your tests against.

License:MITStargazers:0Issues:0Issues:0

spark-extensions

Modified Spark code for SmartDataLakeBuilder

License:Apache-2.0Stargazers:0Issues:0Issues:0

kafka

Mirror of Apache Kafka

License:Apache-2.0Stargazers:0Issues:0Issues:0

Data-Engineering-Projects

Personal Data Engineering Projects

Stargazers:0Issues:0Issues:0

pynecone

🕸 Web apps in pure Python 🐍

License:Apache-2.0Stargazers:0Issues:0Issues:0

hudi

Upserts, Deletes And Incremental Processing on Big Data.

License:Apache-2.0Stargazers:0Issues:0Issues:0

dinky

Dinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.

License:Apache-2.0Stargazers:0Issues:0Issues:0

optimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

License:Apache-2.0Stargazers:0Issues:0Issues:0

mack

Delta Lake helper methods in Python

Stargazers:0Issues:0Issues:0

ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.

License:Apache-2.0Stargazers:0Issues:0Issues:0
License:NOASSERTIONStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

wtfjs

🤪 A list of funny and tricky JavaScript examples

License:WTFPLStargazers:0Issues:0Issues:0

corp

Assets related to the operation of Fishtown Analytics.

License:Apache-2.0Stargazers:0Issues:0Issues:0

python-deequ

Python API for Deequ

License:Apache-2.0Stargazers:0Issues:0Issues:0

aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.

License:NOASSERTIONStargazers:0Issues:0Issues:0

metaflow

:rocket: Build and manage real-life data science projects with ease!

License:Apache-2.0Stargazers:0Issues:0Issues:0

wtfpython

What the f*ck Python? 😱

License:WTFPLStargazers:0Issues:0Issues:0

awesome-spark

A curated list of awesome Apache Spark packages and resources.

License:CC0-1.0Stargazers:0Issues:0Issues:0

co

Style guides and conventions

Stargazers:0Issues:0Issues:0

sdl-examples

Examples for Smart Data Lake

License:GPL-3.0Stargazers:0Issues:0Issues:0

sparklint

A tool for monitoring and tuning Spark jobs for efficiency.

License:Apache-2.0Stargazers:0Issues:0Issues:0