penguin138 / hadoop-tasks

Hadoop tasks repository for Parallel and Distributed Computing course at MIPT 2015

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hadoop-tasks

Hadoop tasks repository for Parallel and Distributed Computing course at MIPT 2015

Contains code of the following tasks:

  • Word Count
  • Inverted Index
  • Matrix Multiplication

Speed-up achieved for Matrix Multiplication

On 4-node Hadoop cluster Matrix Multiplication works for 1.5 min on 500x1000 and 1000x2000 matrices and sequential version of this program, written in Python, works for about 5 min.

About

Hadoop tasks repository for Parallel and Distributed Computing course at MIPT 2015


Languages

Language:Java 90.1%Language:Python 9.9%