Origin: https://github.com/lisy09/hadoop-docker
This is a project to setup a hadoop cluster on docker on one machine for development/test.
It is inspired by the big-data-europe/docker-hadoop project and add additional features.
build_scripts/
: scripts for buildingbase/
: base docker image, install jdk & hadoopconf/
: hadoop's xml configuration files which should be place under hadoop containers' /etc/hadoopdatanode/
: docker image for hadoop datanodenamenode/
: docker image for hadoop namenodenodemanager/
: docker image for hadoop nodemanagerresourcemanager/
: docker image for hadoop resourcemanagersubmit/
: example docker image to submit hadoop application, wordcount.jartest_scripts/
: scripts for running example, e.g., wordcount.env
: dotenv config file for building/running
Please refer to the official document Hadoop Java Versions for selecting a java version.
You can modify the Java version in the ./.env
and build all images again by yourself.
This repo is using Java 8 with OpenJDK.
You can modify the Hadoop version in the ./.env
and build all images again by yourself.
Modify DOCKER_REPO
in ./.env
Modify ./conf/*.xml
- The environment for build needs to be linux/amd64 or macos/amd64
- The environemnt for build needs docker engine installed
- The environemnt for build needs GNU
make
> 3.8 installed
To build all docker images locally:
make all
To push built docker images to the remote registry:
make push
To delete built local docker images:
make clean
Or you can check ./Makefile
for more details.
- build based on the above section
- have docker-compose installed
Please modify ./conf/*.xml when you change hadoop version
To deploy an example HDFS cluster, run:
docker-compose --env-file .env -f docker-compose.yml up -d
or
make deploy
To undeploy an example HDFS cluster, run:
docker-compose --env-file .env -f docker-compose.yml down
or
make undeploy
docker-compose
creates a docker network that can be found by running docker network list, e.g. dockerhadoop_default
.
Run docker network inspect
on the network (e.g. dockerhadoop_default
) to find the IP the hadoop interfaces are published on. Access these interfaces with the following URLs:
- Namenode: http://<dockerhadoop_IP_address>:9870/dfshealth.html#tab-overview
- History server: http://<dockerhadoop_IP_address>:8188/applicationhistory
- Datanode: http://<dockerhadoop_IP_address>:9864/
- Nodemanager: http://<dockerhadoop_IP_address>:8042/node
- Resource manager: http://<dockerhadoop_IP_address>:8088/
To test running the example wordcount job
make wordcount