Spark with Java

Simple mapping and reduce operations with java 8 and scala 2.11. Simple streaming from socket. kafka streaming.

Socket streaming

we need to create a socket and start Java application.

nc -lk 9999

Kafka streaming

  • We need to start up a kafka broker
docker run -d -p 2181:2181 -p 9092:9092 --env ADVERTISED_HOST=localhost --env ADVERTISED_PORT=9092 -v /DIR_TO_LOCAL_FILES:/data --name local-kafka spotify/kafka
  • Enter into kafka machine
docker exec -it local-kafka bash
  • Some commands to execute kafka options
/opt/kafka_2.11- --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic spark
/opt/kafka_2.11- --broker-list localhost:9092 --topic spark
  • Kafka with messages by batches
/opt/kafka_2.11- --broker-list localhost:9092 --topic spark < FILE_WITH_MESSAGES


  • Download hbase from official site.
  • Start hbase with script, you must be sure that there isn't another process listen to port 2181
  • Start hbase shell
./bin/hbase shell
  • Usefully commands for hbase shell
create 'my-table',family1:column1,...,family[n]:column[n]
scan 'my-table', {FILTER => "SingleColumnValueFilter('family','column',=,'binary:value')"}
  • Initial set up for this example
create 'words', 'word', 'count', 'date'


