As I started using Scalding more, I feel like it's hard to just get a simple job up and running on my local machine. I had to google a lot and I would like to sum it up in this repository so anyone stumbled upon this doc would find it beneficial.
This documentation won't explain anything about Scalding since the Wiki explains it really well.
- Install Scala
brew install scala
- Install sbt
brew install sbt
git clone https://github.com/noppanit/scalding-tutorial.git
- Install Java 8
- Install Hadoop
brew install hadoop
- Install SBT plugin
- Install Scala plugin
Intellij should popup a dialog for you to import dependencies from SBT
- You can follow this step
What's missing from that step is you need to have "org.apache.hadoop" % "hadoop-core" % "1.2.1"
as a dependency. Otherwise, Tool
will complain that it's missing Main class
Your Edit Configurations
should look like this
You can try and click Run
It should spits out the word count of Alice in Wonder Land in folder target/data/output0.txt
Have fun!
- Run
sbt clean assembly
- Run
yarn jar target/scala-2.12/scalding-tutorial-assembly-0.1.jar com.twitter.scalding.Tool FirstJob --local