danpaulsmith / clusterix

Visual exploration of clustered data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

alt tag

Clusterix: A visual analytics approach to data clustering

Clusterix is a web-based visual analytics tool that aspires to support clustering tasks by users, while having analysts at the center of the workflow. Clusterix provides the facilities to:

  • Load and preview JSON, CSV, or XML data;
  • select columns to be considered by the clustering algorithm and modify weights;
  • select and run one or more clustering algorithms (k­means, hierarchical clustering) with varying parameters;
  • view and interact with the results in a browser environment;
  • modify the parameters or input data to correct the clustering output.

Such an iterative, visual analytics approach allows users to quickly determine the best clustering algorithm and parameters for their data, and to correct any errors in the clustering output. Clusterix has been applied to the clustering of heterogeneous data sets

Usage

First you need to install the requirements:

pip install -r requirements.txt

To run the project:

python manage.py runserver

This command will run Clusterix on http://127.0.0.1:5000 where you will be able to use the interface to upload data files, and select the algorithms/options that you want.

Features

File input (CSV only currently)

  • Data Preview
  • Field selection
  • Field Scaling

Vectorizers

  • Count Vactorizer
  • Tf-Idf Vectorizer
  • Hashing Vectorizer

Algorithms

  • K-Means
  • Hierarchical Clustering (with various distance/linkage options)

Plot Features

  • Scatterplot/Treemap vizualizations
  • Full text search for nodes
  • Brushing and zoom for targeted inspection
  • Various clustering metrics (TF-IDF, etc)

Screenshots

Wine Data

alt tag

alt tag

Wine Data

About

Visual exploration of clustered data.


Languages

Language:JavaScript 71.4%Language:HTML 13.4%Language:Python 11.6%Language:CSS 3.6%