QuantumBoy-729 / SparkRDD-DS-Project

SparkRDD with shakespeare

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SparkRDD-project

SparkRDD with shakespeare

We are demonstrating the ability of spark over MapReduce and efficiency of processing of RDDs when large amount of data is involved. Spark provides direct functionality of Map-reduces by treating data as a streamline and RDDs help us to utilize the data efficiently. Here we are using large amount of Shakespeare text to analyze which words were used most frequently by, and separately processing with stop words present or not.​

Setup

1. Install the environment

  • pip install pyspark and its dependecies
  • run python main.py

About

SparkRDD with shakespeare


Languages

Language:Python 91.2%Language:Shell 8.8%