borkin8r / hadoop-presentation

Some example map reduce jobs and data for an intro presentation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A few simple map reduce jobs for demonstration and learning purposes.

  • Configuration for building/compiling project in build.xml properties.
  • Assumes HADOOP_HOME environmental variable in bash profile file.
  • Be sure to change the package name in each hadoop run command (wordcount-run, etc.)

Wordcount

Wordcount is a job to count the number of words in multiple files

ant tasks:

  • wordcount-clean
  • wordcount-upload-input
  • wordcount-compile
  • wordcount-run
  • wordcount-output

To run all at once: wordcount

Wordlength

wordlength is a job to how many times words of a certain length occur. For example:

3 5
5 10
8 7
10 2

Means there are 5 words of length 3, 10 words of length 5, etc.

ant tasks:

  • wordlength-clean
  • wordlength-upload-input
  • wordlength-compile
  • wordlength-run
  • wordlength-output

To run all at once: wordlength

ustrades

This job counts how much trading a country is conducting with the US. For example:

"Aphganistan" 1,003,402
"Canada" 423,492,392

Means Aphganistan has traded 1,003,402 units and Canada has traded 423,492,392 units.

ant tasks:

  • ustrades-clean
  • ustrades-upload-input // Does nothing due to size
  • ustrades-compile
  • ustrades-run
  • ustrades-output

To run all at once: ustrades

About

Some example map reduce jobs and data for an intro presentation.