Huoran559 / spark-jupyter-livy-hadoop

standalone spark cluster with Hadoop HDFS storage and Livy Server for interactive analysis in Jupyter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This project provides a stand-alone spark cluster with hadoop storage; jupyterlab notebook server. The following projects/blogs were used to create this project:

Docker

  • Clone this repo and then: docker-compose up --scale spark-worker=2 -d (the optional scale switch creates two spark workers)

Hadoop

  • Get a shell inside the hadoop namenode instance: docker exec -it namenode /bin/bash
  • Create a hdfs directory and upload data by: hdfs dfs -mkdir -p data
  • Load local csv files (user docker cp to move host files to container): hdfs dfs -put ./*.csv /user/root/data

Jupyter

Service URLs

About

standalone spark cluster with Hadoop HDFS storage and Livy Server for interactive analysis in Jupyter


Languages

Language:Shell 42.2%Language:Dockerfile 38.0%Language:Makefile 11.0%Language:CSS 8.8%