subham-agrawall / clustering-scratch

This repository has some clustering techniques implemented from scratch to understand and grasp basic concepts.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clustering from scratch

This repo contains implementation of k-means and DBSCAN algorithm from scratch on a sample dataset.

Dataset

In the below figure, green and blue points represent cluster 1 and cluster 2 respectively. Red points represent noise.

K-Means output

Applying K-means clustering algorithm for given dataset with k=2,

TRUE POSITIVE RATE FOR CLUSTER-1 = 15%
TRUE POSITIVE RATE FOR CLUSTER-2 = 16%
No noise points

DBSCAN output

As observed from the above figure and also from code, we get epsilon=1.22 for given data and k=4. Applying DBSCAN algorithm with a value of k=4,

TRUE POSITIVE RATE FOR CLUSTER-1 = 100%
TRUE POSITIVE RATE FOR CLUSTER-2 = 100%

Thus, DBSCAN performs better than k-means for the given dataset from figures and true positive rates.

About

This repository has some clustering techniques implemented from scratch to understand and grasp basic concepts.


Languages

Language:Python 100.0%