Materials for the Spark Walktrough from 24th of November 2016.
All data files are also available in the data/
directory.
-
Checkout git repository using this URL:
-
Full transcript of the webinar in online IPython notebook viewer:
- Transcript.ipynb
- Transcript contains much more details and examples than what was shown, including some data preparation and exploration activities.
-
Demo on using XML with Spark
-
Slides (see the transcript for the text):