tina31726 / Terasort-Hadoop-vs-Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TeraSort-on-Hadoop vs Spark

CS553 Cloud Computing. Performing Terasort on EC2 in AWS. Methods use for TeraSorting

1.Shared Memory (External Sort using mergesort)
2.Hadoop
3.Spark

Constructing the environment of Hadoop cluster and Spark on Yarn cluster and using sorting program on MapReduce and Spark RDD to compare the performance of Hadoop with Spark

Technology : Java and Python

About


Languages

Language:C 78.2%Language:Java 15.4%Language:Python 3.8%Language:Shell 2.6%