sathya-reddy-m

followers

following

stars

sathya-reddy-m's repositories

bigdata-docker-compose

Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.

MIT000

goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

MIT000

pytest-spark

pytest plugin to run the tests with support of pyspark

MIT000

marmaray

Generic Data Ingestion & Dispersal Library for Hadoop

NOASSERTION000

streaminglens

Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines

Apache-2.0000

Data-Quality-Automation

Data Quality Automation Tool

Apache-2.0000

databricks-accelerators

Accelerate the use of Databricks for customers [public repo]

000

azure-docs

Open source documentation of Microsoft Azure

CC-BY-4.0000

scalatest-embedded-kafka

A library that provides an in-memory Kafka instance to run your tests against.

MIT000

sputnik

000

streaming-data-pipeline

Streaming pipeline repo for data engineering training program

000

sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Apache-2.0000

mist

Serverless proxy for Spark cluster

Apache-2.0000

ts-express-decorators

:triangular_ruler: A TypeScript Framework on top of Express. It provide a lot of decorators and guideline to write your code.

Language:TypeScriptMIT000

CoolplaySpark

酷玩 Spark: Spark 源代码解析、Spark 类库等

000

continuous-intelligence-workshop

Repository with sample code and instructions for "Continuous Intelligence" and "Continuous Delivery for Machine Learning: CD4ML" workshops

Language:PythonMIT000

ci-workshop-app

Language:Python000

data_science_course_exercises

Language:Python000

spark-streaming-with-kafka

Self-contained examples of Apache Spark streaming integrated with Apache Kafka.

NOASSERTION000

bobcat

Data Generation R&D

000

spark-gotchas

Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks

NOASSERTION000

big-data-architecture

000

killrweather

KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka for fast, streaming computations on time series data in asynchronous event-driven environments.

Apache-2.0000

spark-flow

Library for organizing batch processing pipelines in Apache Spark

Apache-2.0000

dplython

dplyr for python

MIT000