dumoulma / pagerank-mr

Mapreduce Implementation of the pagerank algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pagerank-mr

Mapreduce Implementation of the pagerank algorithm

Usage: Generate some data with DataGenerator. You can set the size of the file that will be generated by playing with the constants.

If I get the request, I could always make it read the values from a conf file or from args. As this is not the point of the demo code, I just left it in the code.

When data is generated, just run PagerankMRDriver, making sure that the value for INPUT_SIZE is the same as the size of the input previously generated. Doing another MR job just to figure out N seems like overkill here (and it's Java MR, not pig).

The current default values should allow for running it and it will just work.

The generator is also cluster aware and could very well be set to generate a very very large file.

I added a utility ShowData to display to console the content of a sequence file. Obviously, for the case where the sequence file would be very large, it might take a while to view.

About

Mapreduce Implementation of the pagerank algorithm

License:Apache License 2.0


Languages

Language:Java 100.0%