There are 29 repositories under streaming-data topic.
A curated list of awesome big data frameworks, ressources and other awesomeness.
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Fancy stream processing made operationally mundane
The data warehouse for operational workloads.
Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the hood, ReadySet caches the results of cached select statements and incrementally updates these results over time as the underlying data changes.
Utils for streaming large files (S3, HDFS, gzip, bz2...)
Lean and mean distributed stream processing system written in rust and web assembly.
Open-source graph database, tuned for dynamic analytics environments. Easy to adopt, scale and own.
A lightweight stream processing library for Go
100% Python stream processing with Streaming DataFrames
⚡ Single-pass algorithms for statistics
A machine learning package for streaming data in Python. The other ancestor of River.
HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.
A list about Apache Kafka
🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
Full stack application platform for building stateful microservices, streaming APIs, and real-time UIs
Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
Downloading images from the web is as easy as right clicking them and selecting "Save image as..", right? Well, not anymore xD
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
A real-time interactive web app based on data pipelines using streaming Twitter data, automated sentiment analysis, and MySQL&PostgreSQL database (Deployed on Heroku)
Source code for the Kafka Streams in Action Book
Visualize and graph data in the terminal
Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)
Data stream analytics: Implement online learning methods to address concept drift and model drift in data streams using the River library. Code for the paper entitled "PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT Data Streams" published in IEEE GlobeCom 2021.