stevenrskelton / SparkOverflow

Spark demo using StackOverflow's 2013 data dump.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Apache Spark Statistics

http://stevenskelton.ca/

Basic setup of an in-memory computation project using StackOverflow's data dump.

Installation (Scala 2.10)

  • Download Spark src from github [url], scala 2.10 branch

  • Compile spark assembly

  • sbt assembly

  • Create /lib directory in this project

  • Copy spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop1.0.4.jar from assembly\target\scala-2.10 to /lib

  • compile, and run project -> sbt package run

Unit tests

Change VM arguments -Xmx6096m

About

Spark demo using StackOverflow's 2013 data dump.

License:Apache License 2.0


Languages

Language:Scala 100.0%