capnomad / kafka-pyspark-realtime-dashboard

Real-time report dashboard with Apache Kafka, Apache Spark Streaming and Node.js

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Realtime Dashboard

Real-time report dashboard with Apache Kafka, Apache Spark Streaming and Node.js


Getting started

1. Setup environment

Clone this project

git clone
cd realtime-dashboard/

# Setup env

Download Apache Spark 2.2.0

tar -xzf spark-2.2.0-bin-hadoop2.7.tgz
export SPARK_HOME=$RRD_HOME/spark-2.2.0-bin-hadoop2.7

Download Kafka

tar -xzf kafka_2.11-1.0.0.tgz
export KAFKA_HOME=$RRD_HOME/kafka_2.11-1.0.0

Install Node.js packages

npm install

2. Start Kafka Server

Start Zookeeper and Kafka

cd $RRD_HOME/kafka_2.11-1.0.0

# Start zookeeper
bin/ config/ &

# Start Kafka
bin/ config/ &

Create Topics

bin/ --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic website-collect
bin/ --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic website-report

We can't access Kafka directly via HTTP, so we start Kafka Proxy :

node nodejs-kafka-proxy/server.js

# [2017-11-16 14:24:03,008] INFO Accepted socket connection from / (org.apache.zookeeper.server.NIOServerCnxnFactory)
# [2017-11-16 14:24:03,010] WARN Connection request from old client /; will be dropped if server is in r-o mode (org.apache.zookeeper.server.ZooKeeperServer)
# [2017-11-16 14:24:03,010] INFO Client attempting to establish new session at / (org.apache.zookeeper.server.ZooKeeperServer)
# [2017-11-16 14:24:03,025] INFO Established session 0x15fc38ffab40011 with negotiated timeout 30000 for client / (org.apache.zookeeper.server.ZooKeeperServer)
# Example app listening on port 3000!

Test (Optional) Kafka Produder and Consumer

Open two terminals:

# Terminal 1
$ bin/ --broker-list localhost:9092 --topic website-collect
This is a message
This is another message
{"client_id": "", "time": "1510736940", "event": "view", "ip":"", "UA": "Chrome"}
{"client_id": "", "time": "1510736940", "event": "click", "ip":"", "UA": "Firefox"}
# Terminal 2
$ bin/ --bootstrap-server localhost:9092 --topic website-collect --from-beginning
This is a message
This is another message
{"client_id": "", "time": "1510736940", "event": "view", "ip":"", "UA": "Chrome"}
{"client_id": "", "time": "1510736940", "event": "click", "ip":"", "UA": "Firefox"}

Test proxy server:


You will see in Consumer Kafka:

3. Apache Spark Streaming

Submit Spark Streaming script

# Usage: <zk> <input_topic> <output_topic>

$SPARK_HOME/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 \
    $RRD_HOME/spark/ \
    localhost:2181 website-collect website-report


Real-time report dashboard with Apache Kafka, Apache Spark Streaming and Node.js

License:MIT License


Language:Python 60.0%Language:JavaScript 24.4%Language:Shell 15.5%