KDD-OpenSource / coreq

CORelation EQuivalence (COREQ) algorithm to compute low redundancy correlation matrices

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

COREQ

An implementation of the COREQ (CORrelation EQuivalence) algorithm to compute low redundancy estimates for large correlation matrices. The algorithm is described and analysed in detail in the paper:

Erik Scharwächter, Fabian Geier, Lukas Faber, Emmanuel Müller. Low Redundancy Estimation of Correlation Matrices for Time Series using Triangular Bounds. In: Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2018.

Contact and citation

  • Corresponding author: Erik Scharwächter
  • Please cite our paper if you use or modify our code for your own work.

Description

Please check our our project website at https://hpi.de/mueller/coreq.html for more information on the method, supplementary material and visualizations.

Installation

Automatically build and install the BlockCorr module for Python2 and Python3 with

$ make
$ make install

or use the setup.py script (check --help for details).

Usage

Example: A dataset of 1000 time series (length 50) that consists of two perfectly separated clusters of size 700 and 300.

>>> import numpy as np
>>> import BlockCorr
>>> np.random.seed(0)
>>> X = np.random.permutation(np.concatenate((np.repeat(np.random.randn(1, 50), 700, axis=0),
            np.repeat(np.random.randn(1, 50), 300, axis=0))))
>>> X.shape
(1000, 50)
>>> alpha = 0.8
>>> labels, pivots, pivot_corr_triu, computations = BlockCorr.COREQ(X, BlockCorr.ESTIMATE_AVERAGE, alpha)
clustering finished with 1300 correlation computations --- 2 clusters detected
>>> pivot_corr_triu
array([ 1.        , -0.05918059,  1.        ])

>>> # full pivot correlation matrix:
>>> pivot_corr = np.zeros((len(pivots), len(pivots)))
>>> pivot_corr[np.triu_indices(len(pivots))] = pivot_corr_triu
>>> pivot_corr[np.tril_indices(len(pivots))] = pivot_corr_triu

>>> # full time series correlation matrix:
>>> cluster_matrix = np.zeros((X.shape[0], len(pivots)))
>>> cluster_matrix[range(0,len(labels)), labels] = 1
>>> R = cluster_matrix.dot(pivot_corr).dot(cluster_matrix.transpose())
>>> np.abs(R - np.corrcoef(X)).sum()
0.0

License

The code is released under the MIT license.

About

CORelation EQuivalence (COREQ) algorithm to compute low redundancy correlation matrices

License:MIT License


Languages

Language:C 88.2%Language:Python 11.3%Language:Makefile 0.5%