Rajbirsingh05 / spark-hadoop-airflow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Purpose

This docker container is meant to be used for learning purpose for programming PySpark. It has the following components.

  • Hadoop v3.2.1
  • Spark v2.4.4
  • Conda 3 with Python v3.7

After running the container, you may visit the following pages.

  • HDFS
  • YARN
  • Spark
  • Spark History
  • Jupyter Lab

To run the docker container, type in the following.

bash ./start-docker-container.sh

Click on below link to access portal

Name Node

Hadoop Cluster

Spark Master

History Server

Jupyter lab

Hadoop Data Node

Airflow Image

Spark Worker Node

Airflow Scheduler

About


Languages

Language:Python 43.2%Language:Shell 42.9%Language:Jupyter Notebook 6.4%Language:Dockerfile 5.7%Language:XSLT 1.7%