- This is map reduce implementation from goolge map reduce paper, in C .
- This map-reduce implementation is for single machine and not for cluster of machines ( original intended purpose).
- To build run command
$ ./build.sh
input filename
number of map worker threads
number of reduce worker threads
This is ostep-project on map-reduce. original repo : ostep-projects, 3 easy pieces
-
assumes that
argv[1]
...argv[n-1]
(withargc
equal ton
) all contain files names that are passed to mappers -
map()
andreduce()
are user functions -
MR_Emit()
emits intermediate keys after map function- Uses data structures that used by both mapper and reducer threads
- mapper threads are used to populate the data structure
- and reducer threads are used to consume the data structures
- use read and write lock.
-
What we need :
- Thread pool for mapper threads and reducer threads
- Internal Datastructures that is used to pass key value pairs from mapper to reducer
-
number of threads should be cores/processors on the system + 1 for now [configurable].
- Optimization
- Replace paritions with hashtable implementation ( )
- [ ] Sort the lists in each partitions for iterator ( based on key)
- [ ] Iterator to iterate over paritions list
- [ ] Getter method to get all pairs of key value pair
- [ ] Threadpool for reducer and mapper threads
- [ ] Unoptimized iterator for the disk
- Mutex for container Disk structure