a-Imantha / average-calculation-map-reduce

Calculating Average of a list of numbers with a map-reduce approach on hadoop.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Average Calculation Problem with Mapreduce Approach

This is an example program to calculate the average of a list of numbers using Mapreduce inside hadoop framework.

NOTES:

Program should run on a hadoop cluster and the configurations are set for hadoop 2.10 in the pom file. Can modify that to relevant hadoop version.

This should be packaged to a runnable jar and run against the following arguments,

  • Input File Location
  • Output Folder Location
  • Maximum No of Mapper classes you expect to split the problem into.(optional, default = 10)

The input file is a list of numbers inside a text file(UTF8) a number per line.Numbers can be either int or double.

Development Approach

Code includes a Mapper, Combiner and a Reducer. Mapper split the list of numbers to a maximum of given number of classes(default 10), and handover to combiner. Combiner collapse the classes it recieve to a single key called 'Average'. Then These 'Average' keys are reduced with the Reducer to print the final output.

About

Calculating Average of a list of numbers with a map-reduce approach on hadoop.


Languages

Language:Java 100.0%