vanting / wordcount

Comparing MapReduce to Spark using Wordcount example

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MapReduce VS Spark - WordCount Example

Comparing MapReduce to Spark using Wordcount example.

Requirements

  • IDE
  • Apache Maven 3.x
  • JVM 6 or 7

General Info

The repository contains both MapReduce and Spark projects MRWordCount and SparkWordCount

  • com/stdatalabs/SparkWordcount
    • Driver.scala -- Spark code to perform wordcount
  • com/stdatalabs/mapreduce/wordcount
    • WordCountMapper.java -- Removes special characters from dataset and passes (word, 1) to reducer
    • WordCountReducer.java -- Aggregates values for each key to output wordcount
    • sortingMapper.java -- Receives output from previous MR job and swaps the (K, V) pair
    • sortingComparator.java -- Sorts the mapper output in descending order before passing to reducer
    • sortingReducer.java -- Swaps the (K, V) pair into (word, count) and sends to output file
    • WordCountDriver.java -- Driver program for MapReduce jobs

Description

More articles on hadoop technology stack at stdatalabs

About

Comparing MapReduce to Spark using Wordcount example


Languages

Language:Java 82.6%Language:Scala 17.4%