SparkRDD with shakespeare
We are demonstrating the ability of spark over MapReduce and efficiency of processing of RDDs when large amount of data is involved. Spark provides direct functionality of Map-reduces by treating data as a streamline and RDDs help us to utilize the data efficiently. Here we are using large amount of Shakespeare text to analyze which words were used most frequently by, and separately processing with stop words present or not.
Setup
- pip install pyspark and its dependecies
- run python main.py