All Data, Relevant Information, Scripts, and Applications for the Open Data Science Conference (2018)
-
Download and Install Spark (http://spark.apache.org/downloads.html) or https://www.apache.org/dyn/closer.lua/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz
-
Have maven 3 installed.
brew install maven3
if you want to build the Streaming Trend Discovery code
Wine Reviews - Thanks to zynicide and kaggle.com for the data set.
All actions should be run from root of the odsc directory
-
spark-shell -i part2/coffee/basics.scala
-
spark-shell -i part2/coffee/dataframes.scala
-
Requires 2 terminal windows 3a.
nc -lk 9999
3b.spark-shell -i part2/streaming_coffee.scala
folgers,1
folgers,2,"gross"
ritual,5,"awesome"
four barrel,5,"great"
four barrel,5,"great stuff"
four barrel,5,"really great stuff"
cd data/winereviews && unzip winemag-csv.zip && unzip winemag-json.zip
spark-shell -i part2/wine/hello-wine.scala
spark-shell -i part2/wine/wine_reviews.scala
spark-shell -i part2/wine/wine_reviews_json.scala