sathya-reddy-m's repositories
bigdata-docker-compose
Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.
goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
pytest-spark
pytest plugin to run the tests with support of pyspark
marmaray
Generic Data Ingestion & Dispersal Library for Hadoop
streaminglens
Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
Data-Quality-Automation
Data Quality Automation Tool
databricks-accelerators
Accelerate the use of Databricks for customers [public repo]
azure-docs
Open source documentation of Microsoft Azure
scalatest-embedded-kafka
A library that provides an in-memory Kafka instance to run your tests against.
streaming-data-pipeline
Streaming pipeline repo for data engineering training program
sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
mist
Serverless proxy for Spark cluster
ts-express-decorators
:triangular_ruler: A TypeScript Framework on top of Express. It provide a lot of decorators and guideline to write your code.
CoolplaySpark
酷玩 Spark: Spark 源代码解析、Spark 类库等
continuous-intelligence-workshop
Repository with sample code and instructions for "Continuous Intelligence" and "Continuous Delivery for Machine Learning: CD4ML" workshops
spark-streaming-with-kafka
Self-contained examples of Apache Spark streaming integrated with Apache Kafka.
bobcat
Data Generation R&D
spark-gotchas
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
killrweather
KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka for fast, streaming computations on time series data in asynchronous event-driven environments.
spark-flow
Library for organizing batch processing pipelines in Apache Spark
dplython
dplyr for python