TODO:
optionally reshuffle of data pass in MAP_OUTPUT, MAP_INPUT, DATAFILE, UID, ITERATION_COUNT
Multipass is a parallelization framework for running map-reduce-like python jobs on a single machine. Particularly advantageous for machine learning tasks with medium-sized datasets (10s of GBs) that are computationally expensive.
TODO:
optionally reshuffle of data pass in MAP_OUTPUT, MAP_INPUT, DATAFILE, UID, ITERATION_COUNT
Multipass is a parallelization framework for running map-reduce-like python jobs on a single machine. Particularly advantageous for machine learning tasks with medium-sized datasets (10s of GBs) that are computationally expensive.