nghoanglong / spark-cluster-with-docker

The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Run Spark Cluster within Docker

Untitled Workspace (7)

This is the implementation of spark cluster on top of hadoop (1 masternode, 2 slaves node) using Docker

Follow this steps on Windows 10

1. clone github repo

# Step 1
https://github.com/nghoanglong/spark-cluster-with-docker.git

# Step 2
cd spark-cluster-with-docker

2. pull docker image

docker pull ghcr.io/nghoanglong/spark-cluster-with-docker/spark-cluster:1.0

3. start cluster

docker-compose up

4. access site

  1. hadoop cluster: http://localhost:50070/
  2. hadoop cluster - resource manager: http://localhost:8088/
  3. spark cluster: https://localhost:8080/
  4. jupyter notebook: https://localhost:8888/
  5. spark history server: http://localhost:18080/

About

The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker


Languages

Language:Shell 78.2%Language:Dockerfile 21.8%