iandewancker / multipass

Multipass is a parallelization framework for running map-reduce-like python jobs on a single machine. Particularly advantageous for machine learning tasks with medium-sized datasets (10s of GBs) that are computationally expensive.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TODO:

optionally reshuffle of data pass in MAP_OUTPUT, MAP_INPUT, DATAFILE, UID, ITERATION_COUNT

About

Multipass is a parallelization framework for running map-reduce-like python jobs on a single machine. Particularly advantageous for machine learning tasks with medium-sized datasets (10s of GBs) that are computationally expensive.


Languages

Language:Python 100.0%