joshuarobinson / spark_docker_flashblade

Automation scripts for creating Spark Standalone clusters in Docker

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spark Standalone cluster on Docker with FlashBlade

The scripts and tools in this repository demonstrate automation for a Spark Standalone cluster that uses FlashBlade NFS/S3 for persistent storage.

config_s3.py : automates the creation of S3 users, keys, and buckets.

Dockerfile, build_image.sh : create a docker image for running Spark.

control_sparkstandalone.sh : shell script to start and stop a dockerized Spark Standalone cluster.

{GenerateTestData|RunClustering}.ipynb : Python notebooks demonstrating clustering algorithm.

Requirements for using these scripts:

  • Docker installed and access to a Docker repository.
  • Purity_fb python SDK installed (via pip).
  • FlashBlade management token for accessing REST API.
  • Ansible configured with the host group containing all spark worker nodes.
  • credentials: file containing S3 access/secret keys, can be created with config_s3.py

About

Automation scripts for creating Spark Standalone clusters in Docker


Languages

Language:Shell 31.2%Language:Jupyter Notebook 29.0%Language:Python 26.7%Language:Dockerfile 13.0%