vittoriopolverino / mapreduce-wordcount

MapReduce python implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MapReduce Word Count

A naive python implementation (no distributed computing) to mimic and understand the MapReduce paradigm.


πŸ“œ Table of Contents


🧐 About

MapReduce is a programming model for processing and generating big data sets with a parallel, distributed algorithm on a cluster. The "MapReduce System" is usually composed of three functions (or steps):

  • Map: The map function, also referred to as the map task, processes a single key/value input pair and produces a set of intermediate key/value pairs.
  • Shuffle: The shuffle function transfer data from Mapper to Reducer. It is a mandatory operation for reducers to proceed their jobs further as the shuffling process serves as input for the reduce tasks.
  • Reduce: The reduce function, also referred to as the reduce task, consists of taking all key/value pairs produced in the map phase that share the same intermediate key and producing zero, one, or more data items.

🏁 Getting Started

Use the Pipfile to install packages in the virtualenv:

pipenv install
pipenv install --dev

πŸ’» Usage

Run the MapReduce example:

pipenv run wordcount

πŸ› Test

Run Unit and Integration tests

pipenv run test

⛏️ Built Using


✏️ Authors

About

MapReduce python implementation

License:Apache License 2.0


Languages

Language:Python 100.0%