W-Mrt / Mini-batch-k-Means-Clustering

Implement mini-batch k-means in PySpark distributed framework and test the performance of the algorithm on standard synthetic datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Web-Scale K-Means Clustering

Management and analysis of physical dataset project

Implement and benchmark alternatives of common clustering algorithms in Spark environment, without using the related already provided functions.

The project is thus focused on the efficient implementation of algorithms in a distributed system.

main topics:

Mini-batch k-Means, K-means ++, K-means ||

About

Implement mini-batch k-means in PySpark distributed framework and test the performance of the algorithm on standard synthetic datasets


Languages

Language:Jupyter Notebook 100.0%