zar3bski / hadoop-sandbox

Teaching purpose, demo, data and stack

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

image

prerequisits

you'll need a docker engine and docker-compose

Setting

Clone this repo

add an .env file at the root

CLUSTER_NAME=the_name_of_your_cluster
ADMIN_NAME=your_name
ADMIN_PASSWORD=def@ultP@ssw0rd
INSTALL_PYTHON=true # whether you want python or not (to run hadoop streaming)
INSTALL_SQOOP=true

Start, Stop and Monitor the stack

start the stack

docker-compose up -d --build

stop it

docker-compose down

See logs

docker-compose logs -t -f

alternatively, you can also create user and import data stored in supports/data in HDFS.

chmod +x setup.sh
./setup.sh

access hdfs through the name node

sudo docker exec -it namenode bash

Relevant locations

  • hadoop streaming /opt/hadoop-3.1.1/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar

Web interfaces:

Sources

Most sources were gathered from big-data-europe's repos main repos base of the docker-compose parts added hue hiveserver2

Usefull ressources

complete list of HDFS commands

About

Teaching purpose, demo, data and stack

License:GNU General Public License v3.0


Languages

Language:Shell 90.9%Language:Python 9.1%