Clusterix is a web-based visual analytics tool that aspires to support clustering tasks by users, while having analysts at the center of the workflow. Clusterix provides the facilities to:
- Load and preview JSON, CSV, or XML data;
- select columns to be considered by the clustering algorithm and modify weights;
- select and run one or more clustering algorithms (kmeans, hierarchical clustering) with varying parameters;
- view and interact with the results in a browser environment;
- modify the parameters or input data to correct the clustering output.
Such an iterative, visual analytics approach allows users to quickly determine the best clustering algorithm and parameters for their data, and to correct any errors in the clustering output. Clusterix has been applied to the clustering of heterogeneous data sets
First you need to install the requirements:
pip install -r requirements.txt
To run the project:
python manage.py runserver
This command will run Clusterix on http://127.0.0.1:5000 where you will be able to use the interface to upload data files, and select the algorithms/options that you want.
- Data Preview
- Field selection
- Field Scaling
- Count Vactorizer
- Tf-Idf Vectorizer
- Hashing Vectorizer
- K-Means
- Hierarchical Clustering (with various distance/linkage options)
- Scatterplot/Treemap vizualizations
- Full text search for nodes
- Brushing and zoom for targeted inspection
- Various clustering metrics (TF-IDF, etc)