This is our EE447 final project, idea comes from MIT 6.824 course project. Contributors are @sun-lingyu, @yifanlu0227,@Nicholas0228
-
golang 1.15+
-
crypto/ssh : go get golang.org/x/crypto/ssh@v0.0.0-20201221181555-eec23a3978ad
-
python3-dev : sudo apt-get install python3-dev -y
-
python2-dev : sudo apt-get install python2.7-dev -y
- nltk (for word count & inverted index example ): pip3 install nltk -y ; pip2 install nltk==3.0.0 -y
- numpy (for KNN example): pip install numpy
First run git clone https://github.com/yifanlu0227/mapreduce.git
to download this resposity to your machines. Select one machine to be the coordinator, and others to be workers.
You should edit your worker's ip / username / password in mapreduce/src/main/mrcoordinator.go
like following.
hosts := []string{"192.168.0.132", "192.168.0.184", "192.168.0.33", "192.168.0.199"}
command := "go run mrworker.go " + os.Args[1]
mr.AwakenWorkers("root", "Ydhlw123", hosts, command)
And you should make sure the 1234 port and 8081 port are available, since we will use them for our RPC and http server.
Our MapReduce support python development, i.e., you can just provide a simple python file including map function and reduce function. You can refer to our provide example like word count mapreduce/src/main/wc.py
.
def map(name, contents):
lower = contents.upper()
remove = string.maketrans(string.punctuation, string.punctuation,)
lower1 = lower.translate(remove, string.punctuation,)
without_punctuation = lower1.translate(remove, string.digits,)
tokens = nltk.word_tokenize(without_punctuation)
kva = []
for p in tokens:
lisdict = {}
lisdict[p] = "1"
kva.append(lisdict)
return kva
def reduce(key, values):
return str(len(values))
To run the this word count example with input file pg-*.txt
, run this in terminal
go run mrcoordinator.go wc pg-*.txt
The KNN example
go run mrcoordinator.go knn dataset*.txt
The Inverted Index example
go run mrcoordinator.go inverted_index pg-*.txt
To see the output file, run
cat mr-out-* | sort | more
word count:
inverted index:
KNN large dataset:
worker perspective
file perspective
MIT 6.824