SDSC "hadoop" roll

Overview

This roll bundles the hadoop distributed processing package and the myHadoop add-on.

For more information about the various packages included in the hadoop roll please visit their official web pages:

Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
myHadoop is a simple system for end-users to provision Hadoop instances on traditional supercomputing resources, without requiring any root privileges. Users may use myHadoop to configure and instantiate Hadoop on the fly via regular batch scripts.

Requirements

To build/install this roll you must have root access to a Rocks development machine (e.g., a frontend or development appliance).

If your Rocks development machine does not have Internet access you must download the appropriate hadoop source file(s) using a machine that does have Internet access and copy them into the src/<package> directories on your Rocks development machine.

Dependencies

The sdsc-roll must be installed on the build machine, since the build process depends on make include files provided by that roll.

Building

To build the hadoop-roll, execute this on a Rocks development machine (e.g., a frontend or development appliance):

% make 2>&1 | tee build.log

A successful build will create the file hadoop-*.disk1.iso. If you built the roll on a Rocks frontend, proceed to the installation step. If you built the roll on a Rocks development appliance, you need to copy the roll to your Rocks frontend before continuing with installation.

Installation

To install, execute these instructions on a Rocks frontend:

% rocks add roll *.iso
% rocks enable roll hadoop
% cd /export/rocks/install
% rocks create distro

Subsequent installs of compute and login nodes will then include the contents of the hadoop-roll. To avoid cluttering the cluster frontend with unused software, the hadoop-roll is configured to install only on compute and login nodes. To force installation on your frontend, run this command after adding the hadoop-roll to your distro

% rocks run roll hadoop host=NAME | bash

where NAME is the DNS name of a compute or login node in your cluster.

In addition to the software itself, the roll installs hadoop environment module files in:

/opt/modulefiles/applications/hadoop

Testing

The hadoop-roll includes a test script which can be run to verify proper installation of the hadoop-roll documentation, binaries and module files. To run the test scripts execute the following command(s):

% /root/rolltests/hadoop.t

sdsc / hadoop-roll