Clusterix: A visual analytics approach to data clustering

Clusterix is a web-based visual analytics tool that aspires to support clustering tasks by users, while having analysts at the center of the workflow. Clusterix provides the facilities to:

Load and preview JSON, CSV, or XML data;
select columns to be considered by the clustering algorithm and modify weights;
select and run one or more clustering algorithms (kmeans, hierarchical clustering) with varying parameters;
view and interact with the results in a browser environment;
modify the parameters or input data to correct the clustering output.

Such an iterative, visual analytics approach allows users to quickly determine the best clustering algorithm and parameters for their data, and to correct any errors in the clustering output. Clusterix has been applied to the clustering of heterogeneous data sets

Usage

First you need to install the requirements:

pip install -r requirements.txt

To run the project:

python manage.py runserver

This command will run Clusterix on http://127.0.0.1:5000 where you will be able to use the interface to upload data files, and select the algorithms/options that you want.

Features

File input (CSV only currently)

Data Preview
Field selection
Field Scaling

Vectorizers

Count Vactorizer
Tf-Idf Vectorizer
Hashing Vectorizer

Algorithms

K-Means
Hierarchical Clustering (with various distance/linkage options)

Plot Features

Scatterplot/Treemap vizualizations
Full text search for nodes
Brushing and zoom for targeted inspection
Various clustering metrics (TF-IDF, etc)

Screenshots

Wine Data

About

Visual exploration of clustered data.

Languages

Language:JavaScript 71.4%Language:HTML 13.4%Language:Python 11.6%Language:CSS 3.6%