abhijeetpathak / docker-spark

Docker image for spark in standalone mode

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Docker container for spark stand alone cluster

This repository contains a set of scripts and configuration files to run a Apache Spark standalone cluster from Docker container.

To run master execute:

./start-master.sh

To run worker execute:

./start-worker.sh

You can run multiple workers. Every worker would be able to find master by it's container name "spark_master".

To run spark shell against this cluster execute:

./spark-shell.sh

You can run multiple shells. Every shell would be able to find master by it's container name "spark_master".

If you like to run another container against this cluster, please read explanation how to prepare driver container.

If you need to increase memory or core count or pass any other parameter to spark, please use:

./spark-shell.sh --executor-memory 300M --total-executor-cores 3
./start-worker.sh --memory 700M

If you execute these images without scripts mentioned above, please:

  • Remeber to name master container as spark_master for correct working on linkage.
  • Read documentation to understand what's going on.

I also recommend you to use Zeppelin instead of spark shell for working with data. It has pleasant GUI and IPython like functionality. Please use docker container for that.

About

Docker image for spark in standalone mode

License:Apache License 2.0


Languages

Language:Shell 100.0%