Members:
Sub Topic
- Sagar - Demonstrate on Spouts and Topology.
- Susan - Demonstrate Installation and set up process.
- Saroj - Demonstrate on Bolts.
- Jing - Demonstrate Architecture of Storm
Click in the Video Link to watch the video of How to install and run Storm
-
Anaconda3
-
Java
-
Maven
-
Zookeeper
I have used chocolatey to install Anaconda3 Java and Maven
Anaconda3
choco install anaconda3 -y
Java
choco install openjdk -y
Maven
choco install maven -y
Zookeeper
For Zookeeper I have used the same path as we used before. So I used following command to configure Zookeeper.
.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties
But you can download zookeeper from following link.
http://download.nextag.com/apache/zookeeper/zookeeper-3.3.6/
After downloading use following commands to configure zookeeper.
> cd zookeeper-3.3.6
> copy conf\zoo_sample.cfg conf\zoo.cfg
> .\bin\zkServer.cmd
Strom
Download the Zipped file for storm from the below link.
https://dl.dropboxusercontent.com/s/iglqz73chkul1tu/storm-0.9.1-incubating-SNAPSHOT-12182013.zip
Then extract it in C:\ using 7zip.
-
Variable= M2_HOME Value= C:\ProgramData\chocolatey\lib\maven\apache-maven-3.6.3
Inside Path -
Variable= JAVA_HOME Value= C:\OpenJDK\jdk-15.0.1
-
Variable=STORM_HOME Value= C:\storm-0.9.1-incubating-SNAPSHOT-12182013
Make sure that files are stored in same address or use your own address. I have stored everything in C:\ drive.
-
%JAVA_HOME%\bin
-
%M2_HOME%\bin
-
%STORM_HOME%\bin
-
C:\tools\Anaconda3
-
C:\tools\Anaconda3\Library\mingw-w64\bin
-
C:\tools\Anaconda3\Library\usr\bin
-
C:\tools\Anaconda3\Library\bin
-
C:\tools\Anaconda3\Scripts
Then Start Nimbus, Supervisor and Strom UI to start Storm.
For each, open a separate command prompt.
Use following commands in each command prompt.
storm nimbus
storm supervisor
storm ui
After running these, open http://localhost:8080/ in the browser and if you can see following screen, your Storm is running.
Apache Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing the realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use!
Nimbus is a master node of Storm cluster. All other nodes in the cluster are called as worker nodes. Master node is responsible for distributing data among all the worker nodes, assign tasks to worker nodes and monitoring failures.
Command:
storm nimbus
The nodes that follow instructions given by the nimbus are called as Supervisors. A supervisor has multiple worker processes and it governs worker processes to complete the tasks assigned by the nimbus.
Command:
storm supervisors
Command:
Link to Video :- https://app.vidgrid.com/view/43En4nFy4RNA
After completing all the process above, now lets talk about Spout and Topology.
A topology is a graph of computation. We use it for realtime computation on Storm.Running a topology is straightforward. First, you package all your code and dependencies into a single jar. Then, run the following code:
storm jar all-my-code.jar org.apache.storm.MyTopology arg1 arg2
- While making a topology first you have to make a topology builder
TopologyBuilder builder
- After the toplogy is created you need to set Spout in that topology
builder.setSpout
- Then you need to set bolt in that topology
builder.setBolt
- After you set spout and bolt you need to sumbit the topology
submit()
- Finnaly you can start the topology
ConfigurableTopology.start
A spout is a source of streams in a topology. Generally spouts will read tuples from an external source and emit them into the topology. Spouts can either be reliable or unreliable. Spout also has different methods, such as
open()
- It will call a task for this component is initialized within.nextTuple()
- When this method is called. Storm is requesting that the Spout emit tuple to the output collector.ack()
- Storm has determined that the tuple emitted by this Spout with the id identifier has been fully processed.fail()
- The tuple emitted by this Spout with the id identifier has failed to be fully processed.
Video Link: Click here to open video
Bolts are where all the processing in the topologies are done. All the processings like filtering, functions, aggregations, joins, talking to database etc. can be done in the bolts. If you want your topology to emit more than one streams, you can do in the bolt.
execute()
is the main and most important method in apache storm where you do the calculation and emit the values in the topology.
In our example we are doing word count that are created in the spout and are splitted in split bolt and are counted in the count bolt. Example: if you were to do equivalence of map reduce job, you would fetch the data in the spout and would do mapper, sort and reducer in their particular bolts. And finally you would bind them together in the Topology.
Assuming you have zookeeper, nimbus, supervisor and UI running, follow the following steps to run the project in the root directory:
mvn clean compile assembly:single
- This will first clean the project, compiles it and create a target folder with a fat jar file.
storm jar .target/storm-demo-0.0.1-SNAPSHOT-jar-with-dependencies.jar edu.nwmissouri.bigdatastorm.Topology WordCount -c nimbus.host=localhost
- This will Deploy the Word Count topology to your local cluster with the storm jar command.
- https://www.tutorialspoint.com/apache_storm/apache_storm_cluster_architecture.htm
- https://blog.knoldus.com/apache-storm-architecture/
- https://storm.apache.org/releases/current/Tutorial.html
- https://storm.apache.org/releases/current/Concepts.html
- https://www.youtube.com/watch?v=5kiZs1a8UPM&ab_channel=edureka%21
- http://ptgoetz.github.io/blog/2013/12/18/running-apache-storm-on-windows/
- https://github.com/apache/storm/tree/master/examples/storm-starter