PUC Big Data workshop assignment

Docker

We use Jupyter's docker image pyspark-notebook. To run it, execute:

docker run -p 8888:8888 jupyter/pyspark-notebook:latest start-notebook.sh --NotebookApp.token=""

Part #1

A simple counting algorithm; given a list of words, use MapReduce paradigms to obtain individual word counts. Maybe display some different approaches?

The students will be required to fill in particular critical parts of code.

Part #2

(TODO)

A slightly more complicated algorithm; obtain the average of a list of numbers using MapReduce paradigms. Since the list is divided over individual workers, this requires some rethinking of the problem!

About

PUC Big Data workshop assignment repository

Languages

Language:Jupyter Notebook 100.0%