a-romero/mapr-streams-sparkstreaming-hbase

Commands to run :

Step 1: First compile the project on eclipse: Select project  -> Run As -> Maven Install

Step 2: use scp to copy the ms-sparkstreaming-1.0.jar to the mapr sandbox or cluster

also use scp to copy the data sensor.csv file from the data folder to the cluster
put this file in a folder called data. The producer reads from this file to send messages.

scp  ms-sparkstreaming-1.0.jar user01@ipaddress:/user/user01/.
if you are using virtualbox:
scp -P 2222 ms-sparkstreaming-1.0.jar user01@127.0.0.1:/user/user01/.

Create the topic

maprcli stream create -path /user/user01/pump -produceperm p -consumeperm p -topicperm p
maprcli stream topic create -path /user/user01/pump -topic sensor -partitions 3

get info on the topic
maprcli stream info -path /user/user01/pump


To run the MapR Streams Java producer and consumer:

java -cp ms-sparkstreaming-1.0.jar:`mapr classpath` solution.MyProducer

java -cp ms-sparkstreaming-1.0.jar:`mapr classpath` solution.MyConsumer

Step 3:  To run the Spark  streaming consumer:

start the streaming app
 
spark-submit --class solution.SensorStreamConsumer --master local[2] ms-sparkstreaming-1.0.jar

ctl-c to stop 

step 4: Spark streaming app writing to hbase

first make sure that you have the correct version of HBase installed, it should be 1.1.1:

cat /opt/mapr/hbase/hbaseversion 
1.1.1

Next make sure the Spark HBase compatibility version is correctly configured here: 
cat  /opt/mapr/spark/spark-1.5.2/mapr-util/compatibility.version 
hbase_versions=1.1.1

If this is not 1.1.1 fix it.


Create an hbase table to write to:
launch the hbase shell
$hbase shell

create '/user/user01/sensor', {NAME=>'data'}, {NAME=>'alert'}, {NAME=>'stats'}

Step 4: start the streaming app writing to HBase
 
spark-submit --class solution.HBaseSensorStream --master local[2] ms-sparkstreaming-1.0.jar

ctl-c to stop

Scan HBase to see results:

$hbase shell

scan '/user/user01/sensor' , {'LIMIT' => 5}

scan '/user/user01/sensor' , {'COLUMNS'=>'alert',  'LIMIT' => 50}

Step 5:
exit, run spark (not streaming) app to read from hbase and write daily stats 

spark-submit --class solution.HBaseReadRowWriteStats --master local[2] ms-sparkstreaming-1.0.jar

$hbase shell

scan '/user/user01/sensor' , {'COLUMNS'=>'stats',  'LIMIT' => 50}


cleanup if you want to re-run:

maprcli stream topic delete -path /user/user01/pump -topic sensor 
maprcli stream topic create -path /user/user01/pump -topic sensor -partitions 3
hbase shell 
truncate '/user/user01/sensor'
a-romero / mapr-streams-sparkstreaming-hbase

About

Languages