sim-pez / k_means_distributed

K-Means algorithm for distributed systems

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

last commit

Intro

This is an implementation of k-means clustering algorithm for distributed systems using Hadoop. You will need to manually specify the number of clusters K when launching the program.

Generating dataset

You can generate an N points dataset using datasetgen.py. You have to write also the number of clusters K and the standard deviation. The command will be like:

python datasetgen.py N K STD

example:

python datasetgen.py 1000 3 0.45

Plotting

If you are running the program on a single node for testing purposes, you can also plot the result using:

python plot.py

Other k-means versions

We made also:

Acknowledgments

Parallel Computing - Computer Engineering Master Degree @University of Florence.

About

K-Means algorithm for distributed systems


Languages

Language:Java 90.0%Language:Python 10.0%