junit / dockerized-hadoop

Files to create Hadoop docker images

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

These are docker images of Apache Hadoop.

Properties

The images have a small footprint ( base docker image is alpine linux).

Available hadoop services are:

  • HDFS namenode
  • HDFS datanode

Common Hadoop Configuration

The base image provides a custom entrypoint that uses environment variables to set hadoop configuration file properties.

Environment variables must be in the form <PREFIX>_<HADOOP_PROPERTY>.

With PREFIX one of the following:

- CORE_CONF: /etc/hadoop/core-site.xml
- HDFS_CONF: /etc/hadoop/hdfs-site.xml
- YARN_CONF: /etc/hadoop/yarn-site.xml
- HTTPFS_CONF: /etc/hadoop/httpfs-site.xml
- KMS_CONF: /etc/hadoop/KMS-site.xml

And the HADOOP_PROPERTY should be provided with the following replacements:

. => _
_ => __
- => ___

For example:

fs.defaultFS property of core-site.xml file should be provided as the environment variable:

CORE_CONF_fs_defaultFS

dfs.replication of hdfs-site.xml file should be provided as:

HDFS_CONF_dfs_replication

Network

To enable multihomed networks, set the environment variable MULTIHOMED_NETWORK.

HDFS Configuration

hdfs-namenode container accepts CLUSTER_NAME environment variable which defaults to "hadoop".

Optional non-hadoop configuration

Image also accepts configuration through simple environment variable that translates into specific hadoop configuration variables.

  • HDFS_NAMENODE_URL in the form of 'hdfs://NAMENODE_HOST:NAMENODE_PORT'

Example of usage

Example of a hdfs sinlge namenode and three datanodes.

docker run -d --name hdfs-namenode -p 9870:9870 junit/hdfs-namenode
docker run -d --link hdfs-namenode --name hdfs-datanode1 -e CORE_CONF_fs_defaultFS=hdfs://hdfs-namenode:8020 junit/hdfs-datanode
docker run -d --link hdfs-namenode --name hdfs-datanode2 -e CORE_CONF_fs_defaultFS=hdfs://hdfs-namenode:8020 junit/hdfs-datanode
docker run -d --link hdfs-namenode --name hdfs-datanode3 -e CORE_CONF_fs_defaultFS=hdfs://hdfs-namenode:8020 junit/hdfs-datanode

Testing: native library support:

docker exec -ti hdfs-namenode hadoop checknative -a

Testing: creating and listing and example folder in hdfs

docker exec -ti hdfs-namenode hdfs dfs -mkdir /example
docker exec -ti hdfs-namenode hdfs dfs -ls /

About

Files to create Hadoop docker images


Languages

Language:Shell 61.0%Language:Dockerfile 39.0%