This repo contains slides and a demo notebook from my talk at PyCon UK 2018.
You can watch the talk on youtube.
In this talk I presented:
- A brief introduction to Apache Spark.
- Connecting to a Spark cluster running the Apache Livy REST interface from Jupyter with sparkmagic and any Python code with pylivy.
- The basics of loading data into Spark, manipulating it and doing analysis with MLlib.
- Retrieving data back into Jupyter or Python for further analysis.
- An example web app using Plotly Dash, Python RQ and pylivy to build a Spark-powered dashboard using only Python.
Any questions or feedback are welcome either as GitHub issues on this repo, or directly over email at wacrozier@gmail.com.
pylivy doesn't yet support nearly all the features provided by Livy. If you'd like to contribute please get in touch!