Hadoop Installation Steps

Source

https://medium.com/@festusmorumbasi/installing-hadoop-on-ubuntu-20-04-4610b6e0391e

Prerequisite:

Download and Install any Debian based distribution of linux (Xubuntu: Recommended) on VMware or VirtualBox

Steps:

Create separate Hadoop User

sudo adduser hadoop

Error: (which will be faced in step 5)

user not in sudoers’ list

Solution:

sudo usermod -aG Hadoop

Install Java

First Update packages information

sudo apt update

Then Install Java

sudo apt install openjdk-8-jdk -y

After installation run below command to check the java version

java -version

Install Openssh on Ubuntu

sudo apt install openssh-server openssh-client -y

Switch from current user to newly created user Hadoop

sudo su - hadoop

We need to setup password less ssh connection with the Hadoop user

Generate public private key pair for ssh connection

ssh-keygen -t rsa

(optional) If present working directory is not the home directory then run

cd

Then concatenate newly generated public key to the “authorized_keys” file

sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Change the permission of the “authorized_keys” file

sudo chmod 640 ~/.ssh/authorized_keys

In linux you can change permission using the numerical format as well as using flags like rwx with operators like + and - to add and remove permissions

here 640 stands for the permission, in linux each file has 3 permission groups

owners, groups, all users

each number in this 3 digit number is decimal format of binary permissions

0 = ---

1 = --x

2 = -w-

3 = -wx

4 = r-

5 = r-x

6 = rw-

7 = rwx

6 = 2^2+2^1+2^0 = 110 = rw-

4 = 2^0+2^2+2^0 = 010 = r—

0 = 2^0+2^0+2^0 = 000 = —-

For example:

chmod 777 foldername will give read, write, and execute permissions for everyone.

chmod 700 foldername will give read, write, and execute permissions for the user only.

After that run code to connect to own ssh to add machine to known hosts

ssh localhost

Install Hadoop from the official link

wget [https://downloads.apache.org/hadoop/common/hadoop-3.3.2/hadoop-3.3.2.tar.gz](https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz)

Extract the above downloaded file

tar -xvzf hadoop-3.3.2.tar.gz

Change directory name to Hadoop

mv hadoop-3.3.2 hadoop

Configure Java Environment variables for setting up Hadoop

dirname $(dirname $(readlink -f $(which java)))

Now we need to configure Hadoop

To configure hadoop we need to edit a bunch of files like

.bashrc, hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site-xml and yarn-site.xml

.bashrc

It is the configuration file for bash (Bourne Again Shell)
We need to add Paths for our hadoop to work properly

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 export HADOOP_HOME=/home/hadoop/hadoop export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

To Activate above changes in bash, execute

source ~/.bashrc

hadoop-env.sh

nano is cli editor you can use any other text editor as well like vim.

sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Here we need to configure JAVA_HOME variable so that Hadoop can know which java version to use.

Method 1: If you followed above method to install Java, add below line in this file

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Method 2: If you installed by other method

Find where Java is installed, by executing

which java

Then type below command to find the path of OpenJDK directory

readlink -f /usr/bin/javac

Unix system too have concept of shortcuts, readlink gives the path of the actual file to which a shortcut directs

In the above result the path before /bin/javac i.e. /usr/lib/jvm/java-8-openjdk-amd64 needs to be added in the hadoop-env.sh

Example : If Your output is like /usr/lib/jvm/java-11-openjdk-amd64/bin/javac then add /usr/lib/jvm/java-11-openjdk-amd64 to the file.

core-site.xml

sudo vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh

We need to edit this file to add URL for our name node as it is maintained from a web interface.
If you have code present in your file then just add below code between those tags or if not then first add those configuration tags then add below code.

<property>

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

hdfs-site.xml

sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Same as above add below code between configuration tags.
With this file we can define the locations to store node metadata, fsimage file, and edit log file.
dfs.name.dir is assigned the location to store the name node data.
dfs.data.dir is assigned the location to store the data node data.
This code specifies replication value which is set to 1 as it is a single node cluster.

<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.name.dir</name> <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value> </property>

mapred-site.xml

sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Below code changes the default mapreduce framework name to yarn

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

yarn-site.xml

sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Add below lines of code to this file

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

Format NameNode

hdfs namenode -format

Start Hadoop Cluster

start-dfs.sh : to start distributed file system

Error:

Solution:

cd hadoop/etc/hadoop

vim hadoop-env.sh

Paste below line in this file

export HADOOP_SSH_OPTS="-p 22”

start-yarn.sh : to start resource negotiator

jps : Lists running Java VM on target machine

The output of above code will look like

Access Hadoop Web UI

After all this steps, we can monitor Hadoop from the web interface by reaching below URL from browser

[http://localhost:9870](http://localhost:9870)

Ethan0456 / Hadoop-Installation

Hadoop Installation Steps

About