prateek22sri / MapReduce-basic-statistics

Implementation of basic statistics such as min, max, average, and standard deviation of a given data set using MapReduce paradigm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MapReduce_basicStatistics Cloud Computing project 1

The idea of this project is to get you started with Hadoop and the MapReduce concept. You may have already looked at the WordCount example, both serial and Hadoop implementations. This problem is similar to WordCount except that you will be computing the basic statistics such as min, max, average, and standard deviation of a given data set.

The input to the program will be a text file carrying exactly one floating point number per line. The output should include min, max, average, and standard deviation of these numbers.

Deliverables

You will need to complete the source code and write a report. Zip your work into a file with the name username project1.zip (replace ’username’ with your own) and submit the following:

  • Complete source code

  • A document with the following details: – Transformation of data during the computations, i.e. data type of key, value – The data structure used to transfer between Map and Reduce phases – How the data flow happens through disk and memory during the computation

    For further details, click here

About

Implementation of basic statistics such as min, max, average, and standard deviation of a given data set using MapReduce paradigm


Languages

Language:Java 94.6%Language:Shell 5.4%