noppanit / scalding-tutorial

A missing guide to get started on Scalding

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A missing guide for Scalding

As I started using Scalding more, I feel like it's hard to just get a simple job up and running on my local machine. I had to google a lot and I would like to sum it up in this repository so anyone stumbled upon this doc would find it beneficial.

This documentation won't explain anything about Scalding since the Wiki explains it really well.

How to get started

  1. Install Scala brew install scala
  2. Install sbt brew install sbt
  3. git clone https://github.com/noppanit/scalding-tutorial.git
  4. Install Java 8
  5. Install Hadoop brew install hadoop

How to get started with Intellij

  1. Install SBT plugin
  2. Install Scala plugin

Intellij should popup a dialog for you to import dependencies from SBT

How to run the FirstJob

  1. You can follow this step

What's missing from that step is you need to have "org.apache.hadoop" % "hadoop-core" % "1.2.1" as a dependency. Otherwise, Tool will complain that it's missing Main class

Your Edit Configurations should look like this

screen shot 2018-06-15 at 4 18 40 pm

You can try and click Run It should spits out the word count of Alice in Wonder Land in folder target/data/output0.txt

Have fun!

Run from Command line

  1. Run sbt clean assembly
  2. Run yarn jar target/scala-2.12/scalding-tutorial-assembly-0.1.jar com.twitter.scalding.Tool FirstJob --local

Reference

  1. https://medium.com/@gayani.nan/how-to-run-a-scalding-job-567160fa193

About

A missing guide to get started on Scalding

License:Apache License 2.0


Languages

Language:Scala 100.0%