cloudera-jupyter-notebook-spark

Settings for using Jupyter hub/notebook with CDH5 and PySpark

Including kernels for Spark inclued with CDH5 (currently 1.6) and the latest released version of Spark

Prereqs

Kernels are located in /usr/local/share/jupyter/kernels

Move pysparkCDH and pysparkLatest folders into the above path

hadoop fs -mkdir /lib
hadoop fs -put /opt/spark/jars/*.jar /lib

hadoop fs -put /opt/spark/yarn /lib

Add to yarn-site.xml for yarn.application.classpath - hdfs://hdfs-cluster:8020/lib/yarn/*

Configure JAVA_HOME for all worker nodes to point to Java 8 install, this can be done in Cloudera Manager

Make sure the HDFS path is correct and adjust to your setup, then place into /opt/spark/conf