youjp / catch22

catch-22: CAnonical Time-series CHaracteristics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

catch22 - CAnonical Time-series CHaracteristics

DOI

This is a collection of 22 time series features contained in the hctsa toolbox coded in C. Features were selected by their classification performance across a collection of 93 real-world time-series classification problems (according to the op_importance repository).

NOTE: The included features only evaluate dynamical properties of time series and do not respond to basic differences in the location (e.g., mean) or spread (e.g., variance). If you think features of the raw distribution may be important for your application, we suggest you add them (in the simplest case, the mean and standard deviation) to this feature set.

For information on how this feature set was constructed see our open-access paper:

For information on the full set of over 7000 features, see the following open-access publications:

Using the catch22-features from Python, Matlab and R

The fast C-coded functions in this repository can be used in Python, Matlab, and R following the instructions below. Time series are z-scored internally which means e.g., constant time series will lead to NaN outputs. The wrappers for Matlab and Python run using either GCC or MSVC as compiler. The R wrapper so far only runs using GCC and was only tested on OS X.

Python

Installation of the Python wrapper differs slightly between Python 2 and 3.

Installation Python 3

Manual installation through distutils

python3 setup_P3.py build
python3 setup_P3.py install

Or using pip

pip install catch22

Installation Python 2

Go to the directory wrap_Python and run the following

python setup.py build
python setup.py install

or alternatively, using pip, go to main directory and run

pip install -e wrap_Python

Test Python 2 and 3

To test that the catch22 wrapper was installed successfully and works run (NB: replace python with python3 for Python 3):

$ python testing.py

The module is now available under the name catch22. Each feature function can be accessed individually and takes arrays as tuple or lists (not numpy arrays). E.g., for loaded data, tsData in Python:

import catch22
catch22.CO_f1ecac(tsData)

All features are bundelled in the method catch22_all which also accepts numpy arrays and gives back a dictionary containing the entries catch22_all['names'] for feature names and catch22_all['values'] for feature outputs.

from catch22 import catch22_all
catch22_all(tsData)

R

This assumes your have R installed and the package Rcpp is available. Clang is required.

Copy all .c- and .h-files from ./C to ./wrap_R/catch22/src. Then go to the directory ./wrap_R and run the following two lines while replacing x.y by the current version number

R CMD build catch22
R CMD INSTALL catch22_x.y.tar.gz

To test if the installation was successful, navigate to ./wrap_R in the console and run:

$ Rscript testing.R

The module is now available in R as catch22. Single functions can be accessed by their name, all functions are bundelled as catch22_all which can be called with a data vector tsData as an argument and gives back a data frame with the variables name for feature names and values for feature outputs:

library(catch22)
catch22_out = catch22_all(tsData);
print(catch22_out)

Matlab

Go to the wrap_Matlab directory and call mexAll from within Matlab. Include the folder in your Matlab path to use the package.

To test, navigate to the wrap_Matlab directory from within Matlab and run:

testing

All feature can be called individually, e.g., catch22_CO_f1ecac. Alternatively, all features are bundeled in a function catch22_all which returns an array of feature outputs and, as a second output, a cell array of feature names. With loaded data tsData:

[vals, names] = catch22_all(data);

Raw C

Compilation

OS X

gcc -o run_features main.c CO_AutoCorr.c DN_HistogramMode_10.c DN_HistogramMode_5.c DN_OutlierInclude.c FC_LocalSimple.c IN_AutoMutualInfoStats.c MD_hrv.c PD_PeriodicityWang.c SB_BinaryStats.c SB_CoarseGrain.c SB_MotifThree.c SB_TransitionMatrix.c SC_FluctAnal.c SP_Summaries.c butterworth.c fft.c helper_functions.c histcounts.c splinefit.c stats.c

Ubuntu:

As for OS X but with -lm switch in from of every source-file name.

Usage

Single files

The compiled run_features program only takes one time series at a time. Usage is ./run_features <infile> <outfile> in the terminal, where specifying <outfile> is optional, it prints to stdout by default.

Mutliple files

For multiple time series, put them – one file for each – into a folder timeSeries and call ./runAllTS.sh. The output will be written into a folder featureOutput. Change the permissions of runAllTS.sh to executable by calling chmod 755 runAllTS.sh.

Output format

Each line of the output correponds to one feature; the three comma-separated entries per line correspond to feature value, feature name and feature execution time in milliseconds. For example:

0.29910714285714, CO_Embed2_Basic_tau.incircle_1, 0.341000
0.57589285714286, CO_Embed2_Basic_tau.incircle_2, 0.296000
...

Testing

Sample outputs for the time series test.txt and test2.txt are provided as test_output.txt and test2_output.txt. The first two entries per line should always be the same. The third one (execution time) will be different.

About

catch-22: CAnonical Time-series CHaracteristics

License:GNU General Public License v3.0


Languages

Language:C 55.5%Language:MATLAB 35.6%Language:C++ 6.5%Language:Python 1.2%Language:R 1.0%Language:Shell 0.1%Language:Rebol 0.0%