havelhakimi / seeds

Prototype based clustering on seeds dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Prototype Based Clustering Analysis on seeds dataset

This a solution notebook to an assignment question given in a Data Mining graduate course. Each code block is accompanied by relevant analysis wherever required.
Dataset link: https://archive.ics.uci.edu/ml/datasets/seeds
Broadly, the following steps have been performed in this solution notebook:

  • Minimal preprocessing on the dataset
  • Explained limitations of KMeans
  • Suggested two existing algorithms (KMedoids and CLARANS) that use some technique to mitigate limitations of KMeans
  • Visualization of given class labels using TSNE
  • Ran KMedoids and CLARANS on the seeds dataset and reported the best results obtained on various cluster validity indices.
    • Further compared the results with KMeans.
  • Reported and visualized the hyperparameter tuning for KMedoids and CLARANS required to achieve the best results obtained on the seeds dataset
  • These above assumptions and the flow of work is according to the questions asked in assignment.

About

Prototype based clustering on seeds dataset


Languages

Language:Jupyter Notebook 100.0%