There are 3 repositories under structured-streaming topic.
酷玩 Spark: Spark 源代码解析、Spark 类库等
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
The Internals of Spark Structured Streaming
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Spark Structured Streaming / Kafka / Cassandra / Elastic
Kinesis Connector for Structured Streaming
Spark Connector to read and write with Pulsar
Custom state store providers for Apache Spark
Real-Time Financial Market Data Processing and Prediction application
Use Kafka and Apache Spark streaming to perform click stream analytics
Astronomy Broker based on Apache Spark
How to build your first Spark application with MLlib, StructuredStreaming, GraphFrames, Datasets and so on? Answer is here!
This repository contains the code base for the Open Stream Processing Benchmark.
Kafka offset committer for structured streaming query
Spark Structured Streaming State Tools
Spark structured streaming examples with using of version 3.5.1
Binding the GDELT universe in a Spark environment
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
Spark Streaming ETL jobs for Mozilla Telemetry
:high_brightness: spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。
A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).
Rocksdb state storage implementation for Structured Streaming.
Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
A structured streaming was applied to the robot data from ROS-Gazebo simulation environment using Apache Spark. Data is collected in Kafka, analyzed by Apache Spark and stored in Cassandra.
Spark 3.0.0 Structured Streaming Kafka Avro Demo
kafka + structured streaming + phoenix + elasticsearch 基于行为日志实现热门推荐,用户偏好推荐,召回融合策略实现。
Structured Streaming is a reference application showing how to easily integrate structured streaming Apache Spark Structured Streaming, Apache Cassandra and Apache Kafka for fast, structured streaming computations on data.
A tutorial on how to use pulsar-spark-connector
Example Spark streaming sample codes with Custom Listeners to push streaming metrics into Amazon CloudWatch metrics
Ingesting real-time Twitter API using tweepy into Kafka and process using Apache Spark Structured Streaming with Sentiment Analysis TextBlob before loading into time-series database, InfluxDB and monitoring dashboard, Grafana
An example of how to create and use Cassandra sink in Spark Structured Streaming application
Real-time ETL pipeline for financial data (kafka, pyspark) .
基于Spark 3.1.x 数据源API实现的MQ数据源示例代码
Efficiently tackle large datasets and perform big data analysis with Spark and Python