ezhaar / spark-openstack

Setup Spark Cluster on OpenStack

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Setup a Spark Cluster with Hadoop and YARN.

PreReq

Use the scripts in https://www.github.com/ezhaar/spark-installer to install hadoop, yarn and spark. We would also need the python package fabric. Fabric can be easily installed using pip or the disto's package manager. We prefer pip.

sudo pip install fabric

P.S. ever heard of virtualenv? its awesome!!

Get Help

./spark-openstack -h

Launch Cluster

./spark-openstack --keyname myKey --slaves 2 --flavor m1.large \
--image spark090-img --cluster_name clusterName launch

Once all machines have been booted, login to the master and run the fabric command to list all the options:

fab -l

Since this is the first login, initialize the cluster.

fab init_cluster

This script will:

  • Copy hadoop, yarn and spark configuration files on all the nodes.
  • Format hadoop's namenode
  • Start hadoop
  • Create user directories in hdfs
  • Start yarn
  • Start spark master and slaves

The fabric file contains options to start and stop hadoop/yarn as well as to reset the cluster to its initial state.

fab start_hadoop

Now you should be able to access the web ui:

  • http://<master-ip>:50070 for namenode
  • http://<master-ip>:50075 for datanode
  • http://<master-ip>:8088 for resource manager

Destroy Cluster

./spark-openstack -c clusterName destroy

About

Setup Spark Cluster on OpenStack


Languages

Language:Python 96.2%Language:Shell 3.8%