This is an implementation of k-means clustering algorithm for distributed systems using Hadoop. You will need to manually specify the number of clusters K when launching the program.
You can generate an N points dataset using datasetgen.py
. You have to write also the number of clusters K and the standard deviation. The command will be like:
python datasetgen.py N K STD
example:
python datasetgen.py 1000 3 0.45
If you are running the program on a single node for testing purposes, you can also plot the result using:
python plot.py
We made also:
- Sequential version in C++
- GPU accelerated version in CUDA/C++
Parallel Computing - Computer Engineering Master Degree @University of Florence.