nicospavlidis / opc

Optimal Projections for Clustering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OPC: Optimal Projections for Clustering

An open source MATLAB/Octave library for Dimensionality Reduction for Clustering

Author: Nicos G. Pavlidis

Date: 2018-07-11

INTRODUCTION

OPC is an open source MATLAB and GNU Octave package that implements clustering methods that seek the optimal low dimensional subspace to identify clusters.

Whenever the data contains irrelevant features, or correlations among subsets of features exist (which is typical in high-dimensional data), or when clusters are defined in different subspaces, the spatial data structure becomes less informative about the underlying clusters. Under these conditions clustering algorithms need to simultaneously solve two interrelated problems: (i) identify the subspace in which clusters can be distinguished, and (ii) associate observations to clusters.

OPC focuses on methods which seek low dimensional subspaces that are optimal with respect to specific clustering criteria. This distinguishes the methods in OPC from generic dimensionality reduction techniques that optimise objective functions that are not related to any clustering criterion, and are therefore not guaranteed to preserve the cluster structure.

DEPENDENCIES

  • To install the package you need a C/C++ compiler. We recommend using the GCC compiler.

  • OPC depends on the improved Fast Gauss Transform.

  • OPC contains a cluster tree class, called ctree. This is a modification of the MATLAB class tree implemented by Jean-Yves Tinevez.

  • In MATLAB OPC depends on the optimization and statistics toolboxes.

  • In GNU Octave OPC depends on the optim and statistics packages.

INSTALLATION

  • Download the latest OPC release from this page.

  • Uncompress the opc-master.zip file.

unzip opc-master.zip
cd opc-master
  • Compile the C++ FIGTree library by following the instructions on the README.md file located in opc-master/src/libs/figtree-0.9.3/, or equivalently on the FIGTree GitHub repository.

  • After the C++ code is compiled you need to compile the MATLAB/ Octave interface to the FIGTree library as well as two C++ functions for kernel density estimation included in OPC. For ease of use the script install.m in the root OPC directory performs these tasks. (Note that the install.m script assumes that the FIGTree library is in the folder DOWNLOAD-PATH/opc-master/src/libs/figtree-0.9.3/. If this is modified you need to edit this script to provide the correct path)

>> cd('DOWNLOAD-PATH/opc-master/')
>> install
  • In GNU Octave you also need to install and load the optim and statistics packages, which can be found at the extra packages for GNU Octave repository.
Setting the path

After each restart of MATLAB/ Octave it is necessary to add to the search path the root OPC directory and all its subdirectories. This is performed by the setpath.m script.

>> cd('DOWNLOAD-PATH/opc-master/')
>> setpath

TESTING

After installation you can execute the script reproduction_script.m (located in the root OPC directory) to reproduce all the examples in the documentation (documentation/documentation.pdf). If this script exits without an error OPC is configured correctly.

cd('DOWNLOAD-PATH/opc-master/')
reproduction_script

More detailed instructions for installing OPC are provided in the file user_guide.pdf.

DOCUMENTATION

  • The online documentation guide contains detailed installation instructions and a large number of nontrivial examples that illustrate how to how to use and extend OPC.
    A PDF version of the documentation is also available in the file user_guide.pdf.

  • An online function reference for OPC is available here.

  • The script file reproduction_script.m located in the root OPC directory contains all the examples discussed in the documentation.

CONTRIBUTING

The preferred way to contribute to OPC is to fork the main repository on GitHub. Detailed instructions are available through the online Git documentation.

Please use GitHub issues to file bug reports and feature requests.

LICENSE

This project is licensed under the BSD-3-Clause License - see the LICENSE.md file for details

ACKNOWLEDGMENTS

The following people contributed to OPC (in alphabetical order):

  • Michael Epitropakis
  • David Hofmeyr
  • Dimitris Kostaras
  • Hankui Peng
  • Sotiris Tasoulis

About

Optimal Projections for Clustering

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:MATLAB 56.3%Language:C++ 35.8%Language:C 6.4%Language:Makefile 1.4%