AdityaSavara / k-means-constrained

K-Means clustering - constrained with minimum and maximum cluster size

Home Page:https://github.com/joshlk/k-means-constrained

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PyPI Python Build Status Documentation

k-means-constrained

K-means clustering implementation whereby a minimum and/or maximum size for each cluster can be specified.

This K-means implementation modifies the cluster assignment step (E in EM) by formulating it as a Minimum Cost Flow (MCF) linear network optimisation problem. This is then solved using a cost-scaling push-relabel algorithm and uses Google's Operations Research tools's SimpleMinCostFlow which is a fast C++ implementation.

This package is inspired by Bradley et al.. The original Minimum Cost Flow (MCF) network proposed by Bradley et al. has been modified so maximum cluster sizes can also be specified along with minimum cluster size.

The code is based on scikit-lean's KMeans and implements the same API with modifications.

Ref:

  1. Bradley, P. S., K. P. Bennett, and Ayhan Demiriz. "Constrained k-means clustering." Microsoft Research, Redmond (2000): 1-8.
  2. Google's SimpleMinCostFlow C++ implementation

Installation

You can install the k-means-constrained from PyPI:

pip install k-means-constrained

It is supported on Python 3.8 and above.

Example

More details can be found in the API documentation.

>>> from k_means_constrained import KMeansConstrained
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...                [4, 2], [4, 4], [4, 0]])
>>> clf = KMeansConstrained(
...     n_clusters=2,
...     size_min=2,
...     size_max=5,
...     random_state=0
... )
>>> clf.fit_predict(X)
array([0, 0, 0, 1, 1, 1], dtype=int32)
>>> clf.cluster_centers_
array([[ 1.,  2.],
       [ 4.,  2.]])
>>> clf.labels_
array([0, 0, 0, 1, 1, 1], dtype=int32)

About

K-Means clustering - constrained with minimum and maximum cluster size

https://github.com/joshlk/k-means-constrained

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Python 82.3%Language:Cython 10.7%Language:Jupyter Notebook 5.3%Language:Makefile 1.3%Language:Batchfile 0.4%