sathya-reddy-m's repositories

bigdata-docker-compose

Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.

License:MITStargazers:0Issues:0Issues:0

goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

License:MITStargazers:0Issues:0Issues:0

pytest-spark

pytest plugin to run the tests with support of pyspark

License:MITStargazers:0Issues:0Issues:0

marmaray

Generic Data Ingestion & Dispersal Library for Hadoop

License:NOASSERTIONStargazers:0Issues:0Issues:0

streaminglens

Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines

License:Apache-2.0Stargazers:0Issues:0Issues:0

Data-Quality-Automation

Data Quality Automation Tool

License:Apache-2.0Stargazers:0Issues:0Issues:0

databricks-accelerators

Accelerate the use of Databricks for customers [public repo]

Stargazers:0Issues:0Issues:0

azure-docs

Open source documentation of Microsoft Azure

License:CC-BY-4.0Stargazers:0Issues:0Issues:0

scalatest-embedded-kafka

A library that provides an in-memory Kafka instance to run your tests against.

License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

streaming-data-pipeline

Streaming pipeline repo for data engineering training program

Stargazers:0Issues:0Issues:0

sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

License:Apache-2.0Stargazers:0Issues:0Issues:0

mist

Serverless proxy for Spark cluster

License:Apache-2.0Stargazers:0Issues:0Issues:0

ts-express-decorators

:triangular_ruler: A TypeScript Framework on top of Express. It provide a lot of decorators and guideline to write your code.

Language:TypeScriptLicense:MITStargazers:0Issues:0Issues:0

CoolplaySpark

酷玩 Spark: Spark 源代码解析、Spark 类库等

Stargazers:0Issues:0Issues:0

continuous-intelligence-workshop

Repository with sample code and instructions for "Continuous Intelligence" and "Continuous Delivery for Machine Learning: CD4ML" workshops

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

spark-streaming-with-kafka

Self-contained examples of Apache Spark streaming integrated with Apache Kafka.

License:NOASSERTIONStargazers:0Issues:0Issues:0

bobcat

Data Generation R&D

Stargazers:0Issues:0Issues:0

spark-gotchas

Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks

License:NOASSERTIONStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

killrweather

KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka for fast, streaming computations on time series data in asynchronous event-driven environments.

License:Apache-2.0Stargazers:0Issues:0Issues:0

spark-flow

Library for organizing batch processing pipelines in Apache Spark

License:Apache-2.0Stargazers:0Issues:0Issues:0

dplython

dplyr for python

License:MITStargazers:0Issues:0Issues:0