yhfyhf / PageRank-Hadoop

Calculate PageRank with Hadoop Streaming.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PageRank-Hadoop

PageRank Implementation in MapReduce. Calculate PageRank with Hadoop Streaming.

Files

nodes: Nodes with their links.

pr: Nodes with their initial PageRanks.

sortMapper.py, sortReducer: Actually does little things, but just sort outputs.

prMapper.py: Maps node to its contributed rank from nodes it is linked from.

prReduce.py: Reduces and calculate new PageRanks.

config.py: N is the number of nodes. alpha is the damping factor and is usualy set to 0.85.

Run test

./test

Run with Hadoop Streaming

./hadoop-run

About

Calculate PageRank with Hadoop Streaming.


Languages

Language:Python 60.8%Language:Shell 39.2%