driver.py
- The hadoop job starter file. This will keep count of iterations and check for centroids convergence.mapper.py
- mapper filereducer.py
- reducer fileutils.py
- utility functionspreprocess.py
- The preprocessing script for the mall customers. The output of this script isinput1.txt
.input1.txt
- The output from the preprocessing script.output1.txt
- The output which contains the centroids id, count, objects and value.
output2.txt
- This contains the labelling for theoutput1.txt
.
preprocess3.py
- The preprocessing script to preprae the input for the problem 3. Combining theinput1.txt
and the labels.input3.txt
- The input to knn algorithm.driver3.py
- The hadoop job starter file. This will keep count of iterations and check for centroids convergence.mapper3.py
- mapper filereducer3.py
- reducer file and LOGIC TO DO MAX VOTING AND ANALYZE RESULT.output3.txt
- result written by the reducer.