gangodu / mapreduce

Group input splits based on city and calculate total wages using Hadoop Map Reduce

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

city based wages analyzer

Working:
    Compilation:
        Compile the program to get the class file and create a JAR to run on top of HDFS in Hadoop
        Preferred JAR creation - Eclipse IDE
        Extract input.zip to get input.csv

    Execution:
        A sample input file is provided from input/input.csv
        Upload the JAR and the input file to HDFS
            [Load the JAR to either the home directory or the bin directory in HDFS]
        At the terminal, run the following
            {
                hadoop -jar [jar name] input/input.csv output/output.csv
            }
        An output directory is automatically created
        When the job is completed, the output directory will have output.csv

    Analysis:
        Based on the need, a program like Microsoft Excel may be used to do analysis.

Future:
Other Hadoop ecosystems will be added for analysis and the program itself will be more generic and abstract such that any input can create any output efficiently on any given data set.

Nice to know:
    Hadoop version      :   1.5.x [stable]
    Map Reduce version  :   2.7
    Eclipse IDE version :   Kepler
    OS Developed On     :   Mac OSX Yosemite
    OS Tested On        :   RHEL
    Cluster Config      :   
      Hadoop-
          Name Node       :   1
          Data Nodes      :   2
      Map Reduce-
          Job Trackers    :   1
          Task Trackers   :   2
      Replication Factor  :   2
      Number of Mappers   :   10
      Number of Reducers  :   2

About

Group input splits based on city and calculate total wages using Hadoop Map Reduce

License:GNU General Public License v2.0


Languages

Language:Java 100.0%