SparkRDD-project

SparkRDD with shakespeare

We are demonstrating the ability of spark over MapReduce and efficiency of processing of RDDs when large amount of data is involved. Spark provides direct functionality of Map-reduces by treating data as a streamline and RDDs help us to utilize the data efficiently. Here we are using large amount of Shakespeare text to analyze which words were used most frequently by, and separately processing with stop words present or not.

Setup

1. Install the environment

pip install pyspark and its dependecies
run python main.py

About

SparkRDD with shakespeare

Languages

Language:Python 91.2%Language:Shell 8.8%