tonellotto / HashToMin

A MapReduce implementation of HashToMin for finding Connected Components in a graph.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HashToMin

A Hadoop MapReduce implementation of the HashToMin algorithm for finding connected components in a graph, starting from an input file either specifying the edges of the graph or the adjacency lists for each node. Each line of the input file represents either an already formed cluster within the graph G or an edge of the graph. Vertex identifiers must be separated by a space or a tab. The output file will contain one connected component per line, with the first node representing its label, followed by a tab and all the cluster's nodes divided by spaces. Sample input files can be found in the folder inputfiles.

The usage is fairly simple and it is listed below. Instantiate the class

public ConnectedComponents (String input,
                            String output, 
                            int reduceTasksNumber,
                            boolean verifyResult,
                            boolean secondarySort) 

where:

  • input and output specify the input and output file paths,
  • reduceTasksNumber specifies the number of reducers available and to be exploited in all jobs but the Export procedure (that must output a single file),
  • verifyResult that is used to execute the CountNodes and the Verifier job if it is set to true,
  • secondarySort to decide which version of the algorithm to use, HashToMinSecondarySort runs when this attribute is true.

Then call the method run() over the new object.

Alternatively, the jar can be run on some input issuing the command

hadoop jar ./target/HashToMin-1.0.jar <input> <output> <numberOfReducers>

from the project folder.

About

A MapReduce implementation of HashToMin for finding Connected Components in a graph.

License:MIT License


Languages

Language:Java 100.0%