This implementation is a scale model of the MapReduce idea, based on the following types and this example.
- The input data is a
List[Inp]
, each element is passed to a Mapper, producing an association map fromMapKey
toVal
. - These
(MapKey, Val)
pairs will be passed to a shuffle function to produce a new key (ShuffleKey
). AllVal
associated with the sameShuffleKey
are gathered in a list for the next step. - Then these
(ShuffleKey, List[Val])
pairs are passed to a reducer function, giving(ShuffleKey, Result)
associations.
The work is dispatched sequentially to instances instead of parallelized on different workers.
All data is kept in RAM, effectively limiting the data size.
The reducer works on a complete list. If we look at the reduce
function (foldLeft
), one element at a time can be passed in without
having to store the complete list.
!!! This isn’t complete… see ActorMapReduce.scala !!!
Rearranging the types allows to have the shuffler forward the data immediately to the reducer:
- Mapper
Inp => Iterator[(MapKey, Val)]
, using anIterator
instead of aMap
obviates the need to store the data.- Shuffler
(MapKey, Val) => ShuffleKey
- Reducer
ShuffleKey => Result => Val => Result
The proposed reducer isn’t equivalent to (ShuffleKey, List[V]) =>
(ShuffleKey, Result)
: assuming the reducer has a Result
type of
Either[Int, String]
(let’s say in the WordCount example that if the
count is even, we want the result as a number (e.g. Left(2)
) and if
the count is odd, we want a String (e.g. Right("3")
).
To account for such cases, the reducer (with a new type ShuffleKey =>
TempResult => Val => TempResult
) must be combined with a
post-processing step ((ShuffleKey, TempResult) => Result
).
This way, old-style reducer functions can be expressed with
TempResult = List[Val]
, the new-style reducer function accumulating
XXX…TODO…XXX and the old-style reducer function in the
post-processing step, but more restricted reducer functions (and the
WordCount example is one) can work with a trivial post-processing step
(the identity function), but without building up the complete
List[Val]
.