xuanyuansen / HITS_Algorithm

Different implementations and comparisons of HITS (Hubs and Authorities) Algorithm in Pig and Spark, using Hive

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HITS_Algorithm

Implementations of the Hubs and Authorities Algorithm (HITS) in Apache Spark and Pig.

Data

Data used is page links of wikipedia pages. Source and description is in the link below:

http://haselgrove.id.au/wikipedia.htm

Overview

Hive directory contains code for reading data into a hive tables and transforming tables into edge list

Pig_Implementation directory contains code for implementing algorithm in Apache Pig

Spark_Implementation directory contains code for implementing algorithm in Apache Spark.

About

Different implementations and comparisons of HITS (Hubs and Authorities) Algorithm in Pig and Spark, using Hive


Languages

Language:Python 63.6%Language:PigLatin 35.5%Language:Shell 0.8%