pyspark-pictures
Learn the pyspark API through pictures and simple examples
View on NBViewer
RDD Example:
# flatMap
x = sc.parallelize([1,2,3])
y = x.flatMap(lambda x: (x, 100*x, x**2))
print(x.collect())
print(y.collect())
[1, 2, 3]
[1, 100, 1, 2, 200, 4, 3, 300, 9]
Install (for interactive use)
- install Spark
- install IPython notebook
Quick Start
-
start pyspark inside IPython notebook
IPYTHON_OPTS="notebook" pyspark
-
open browser to notebook link
-
open pyspark-pictures.ipynb or pyspark-pictures-dataframes.ipynb
-
edit example code, press: ctrl + enter to run each cell
References
Contribute
Contributors are welcome
Original images are here, download to pdf, convert to svg with: genSVD (pdf2svg)