hadoop install

steps and nececry files for installing hadoop + yarn 2.6 on ubuntu 14.10 (from http://releases.ubuntu.com/14.10/ubuntu-14.10-desktop-amd64.iso)

I collected many instructions as I could (see the refs below) but select the steps I like and put them here (It is kind of like cherry pick). Those steps are tested on my hadoop cluster. It works perfect. Three big steps: install packages and config them and hadoop xml files. I used tmux with the function of synchronize-panes for setting all the machines.

##machines

pocoyo-1 192.168.1.72 (master)
pocoyo-2 192.168.1.52 (data node)
pocoyo-3 192.168.1.44 (data node)

edit host

vi /etc/hostname
- check machine name, for each machine, for example, you can modify them if you want
  - pocoyo-1

sudo vi /etc/hosts

add folowing lines, for each machine or use scp to others

127.0.0.1 localhost
192.168.1.72 pocoyo-1 # nameNode
192.168.1.52 pocoyo-2 # secondary namdNode
192.168.1.44 pocoyo-3  # data node

*sudo scp 192.168.1.72:/etc/hosts /etc/hosts (run this on slaves)

##creat hadoop user and user group for each machine

sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
sudo adduser hduser sudo
sudo chown -R hduser:hadoop /usr/local/

##install ssh for each machine (the following is not a secure way but it faster for test purpose)

su - hduser
sudo apt-get intall openssh-server
ssh localhost

on master

ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

on slaves (pocoyo-2 and pocoyo-3)

mkdir .ssh

on master

ssh-copy-id hduser@pocoyo-2 (do the same for pocoyo-3)
ssh hduser@pocoyo-2
ssh hduser@pocoyo-3

##disable ipv6 for each machine (:setw synchronize-panes in tmux worked for me)

sudo vi /etc/sysctl.conf

add following lines

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

##### run
* sudo service networking restart 

##download hadoop for each machine 
(once one dowloaded you can use scp to copy to others)
* su - hduser
* cd /usr/local
* wget http://mirror.reverse.net/pub/apache/hadoop/common/stable2/hadoop-2.6.0.tar.gz 
* tar -xzf hadoop-2.6.0.tar.gz
* ln -s /usr/local/hadoop-2.6.0 /usr/local/hadoop

##install java 1.7 for all machines. 

(once one dowloaded you can use scp to copy to others)

we select 1.7 because it is reported on http://wiki.apache.org/hadoop/HadoopJavaVersions

* su - hduser
* cd cd /usr/local
* wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/7u75-b13/jdk-7u75-linux-x64.tar.gz"
* tar -xzf jdk-7u75-linux-x64.tar.gz
* ln -s /usr/local/jdk-7u75-linux-x64 /usr/local/jdk

## edit /etc/profile for master
(:setw synchronize-panes in tmux worked for me)

* sudo vi /etc/profile
  * add following lines
```sh
  export HADOOP_HOME=/usr/local/hadoop
  export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
  export JAVA_HOME=/usr/local/jdk
  export CLASSPATH=$JAVA_HOME/lib/tools.jar
  export PATH=$JAVA_HOME/bin:$PATH

source /etc/profile
java -version ( to test)

on slaves

sudo scp hduser@pocoyo-1:/etc/profile /etc/profile
source /etc/profile

##config hadoop xml files.

modify $HADOOP_HOME/etc/hadoop/hadoop-env.sh for all machines add

export JAVA_HOME=/usr/local/jdk

modify $HADOOP_HOME/etc/hadoop/slaves for all machines

  pocoyo-1
  pocoyo-2
  pocoyo-3

copy xml files first

cd $HADOOP_HOME
cp ./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml ./etc/hadoop/core-site.xml
cp ./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml ./etc/hadoop/hdfs-site.xml
cp ./share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml ./etc/hadoop/yarn-site.xml
cp ./share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml ./etc/hadoop/mapred-site.xml

modify core-site.xml

property	value	machines
fs.defaultFS	hdfs://pocoyo-1:9000	all
hadoop.tmp.dir	/usr/local/hadoop/tmp	all
io.file.buffer.size	131072	all

modify hdfs-site.xml

property	value	machines
dfs.namenode.rpc-address	pocoyo-1:9001	all
dfs.namenode.secondary.http-address	pocoyo-2:50090	namenode and seconday nameNode
dfs.namenode.name.dir	/usr/local/hadoop/dfs/name	namenode and seconday nameNode
dfs.datanode.data.dir	/usr/local/hadoop/data	datanodes