amcasari / wwconnect-2016-spark4everyone

[WW]Connect 2016 workshop: Apache Spark For Everyone

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

wwconnect-2016-spark4everyone

[WW]Connect 2016 workshop: Apache Spark For Everyone

Installfest

----

Spark

Download a prebuilt Spark release

http://spark.apache.org/downloads.html

INTERACTIVE NOTEBOOKS

This section contains steps to install Apache Zeppelin, RStudio, Databricks Community Edition, and Jupyter.

Apache Zeppelin

  • Interactive web-based notebook platform for data currently being incubated by Apache.

  • Multiple language backend including flavors of Spark, which means we don't have to install separate kernels, modules, plugins, or libraries to use it! Supports Scala(with Apache Spark), Python(with Apache Spark), SparkSQL, Hive, Markdown and Shell.

  • Learn more at https://zeppelin.incubator.apache.org/

Installing Zeppelin at the command line:

Clone the Repo

git clone https://github.com/apache/incubator-zeppelin

Build/Install

mvn install -DXmx512m -DXX:MaxPermSize=256m -DskipTests -Dspark.version=1.6.0 -Dhadoop.version=2.4.0

Start the Zep

bin/zeppelin-daemon.sh start

Open in browser

http://localhost:8080/

For now, need to run a dependency cell as the very first cell, if you want access to spark-csv package

%dep
z.reset()
z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")
z.load("com.databricks:spark-csv_2.11:1.4.0")

Jupyter

While Jupyter runs code in many different programming languages, Python is a prerequisite for installing Jupyter notebook.

Download Jupyter as part of Anaconda Python distribution

https://www.continuum.io/downloads

Jupyter Project installs for most platforms:

http://jupyter.readthedocs.org/en/latest/install.html

You can also get jupyter with the anaconda tool 'conda', or if you dont have anaconda, with pip

conda install jupyter
pip3 install jupyter
pip install jupyter

RStudio

Spark includes SparkR after version 1.4, including a REPL called sparkR. It can also be used within interactive notebook environments -- such as RStudio

You may need to install R

For Mac: get the package here: https://cran.rstudio.com/bin/macosx/ or use a package manager like homebrew

For Windows: ?

Download RStudio Desktop Version

https://www.rstudio.com/products/rstudio/download/

DataBricks Community Edition

A free version of databricks spark platform for learning. There is a waiting list for accounts. No local installation needed! Awesome!


QUICK LINKS

Platforms:

Spark Further:

About

[WW]Connect 2016 workshop: Apache Spark For Everyone

License:MIT License


Languages

Language:HTML 83.1%Language:Jupyter Notebook 12.9%Language:Scala 2.9%Language:R 1.1%