glendawur / MirCl

Small package with useful tools to perform clustering analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MirCl

[Miraculous Clustering]

MirCl is a small package that was initially as code repository for my bachelor thesis (Application of Anomalous Clustering Methods for Determination оf the Number of Clusters) and further research under the supervision.

As of now, this package contains just a few useful tools to perform clustering analysis:

  1. Clustering techniques implementation:
    1. K-Means
    2. Random Swap K-Means (2018, Franti)
    3. Anomalous Patterns (2011, Amorim & Mirkin)
  2. Generating Synthetic Data:
    1. Generator of N-dimensional spheres
    2. Generating a dataset according to (2020, Taran & Mirkin)
  3. Indices to choose the optimal number of clusters:
    1. Analytical Elbow
    2. Hartigan Rule
    3. Calinski-Harabasz
    4. Silhouette Width
    5. Xu index
    6. WB index
  4. Metrics to evaluate partitions in supervised way:
    1. Adjusted Rand Index
    2. Normalized/Adjusted Mutual Information

You can find two showcase notebooks in this folder

Miraculous Example

To-do:

  • [] Add stochastic Maxmin initialization
  • [] Add more generators of synthetic data
  • [] Add jax\numba fast computation of distances
  • [] Add batch versions of clustering techniques
  • [] Add modifications of Anomalous Patterns algorithm
  • [] Add more metrics to evaluate the partition

Requirements:

  • numpy>=1.21.5
  • scipy>=1.9.1
  • pandas>=1.4.4
  • matplotlib>=3.5.2

About

Small package with useful tools to perform clustering analysis

License:MIT License


Languages

Language:Jupyter Notebook 99.5%Language:Python 0.5%