galic1987 / hctsa

Highly comparative time-series analysis code repository

Home Page:https://hctsa-users.gitbook.io/hctsa-manual

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hctsa, highly comparative time-series analysis

hctsa is a software package for running highly comparative time-series analysis using Matlab (full support for versions R2014b or later; for use in python cf. pyopy).

The software provides a code framework that allows thousands of time-series analysis features to be extracted from time series (or a time-series dataset), as well as tools for normalizing and clustering the data, producing low-dimensional representations of the data, identifying discriminating features between different classes of time series, learning multivariate classification models using large sets of time-series features, finding nearest matches to a time series of interest, and a range of other visualizations and analyses.

Feel free to email me for help with real-world applications of hctsa πŸ€“

If you use this software, please read and cite these open-access articles:

Feedback, as email, github issues or pull requests, is much appreciated.

For commercial use of hctsa, including licensing and consulting, contact Engine Analytics.

Getting started

πŸ“– πŸ“– Comprehensive documentation πŸ“– πŸ“– for hctsa is on gitbook.

Downloading the repository

For users unfamiliar with git, the current version of the repository can be downloaded by simply clicking the green Clone or download button, and then clicking Download .zip.

It is recommended to use the repository with git. For this, please make a fork of it, clone it to your local machine, and then set an upstream remote to keep it synchronized with the main repository e.g., using the following code:

git remote add upstream git://github.com/benfulcher/hctsa.git

(make sure that you have generated an ssh key and associated it with your Github account).

You can then update to the latest stable version of the repository by pulling the master branch to your local repository:

git pull upstream master

For analyzing specific datasets, we recommend working outside of the repository so that incremental updates can be pulled from the upstream repository. Details on how to merge the latest version of the repository with the local changes in your fork can be found here.

hctsa licenses

Internal licenses

There are two licenses applied to the core parts of the repository:

  1. The framework for running hctsa analyses and visualizations is licensed as the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. A license for commercial use is available from Engine Analytics.

  2. Code for computing features from time-series data is licensed as GNU General Public License version 3.

A range of external code packages are provided in the Toolboxes directory of the repository, and each have their own associated license (as outlined below).

External packages and dependencies

The following Matlab toolboxes are used by hctsa and are required for full functionality of the software. In the case that some toolboxes are unavailable, the hctsa software can still be used, but only a reduced set of time-series features will be computed.

  1. Statistics Toolbox
  2. Signal Processing Toolbox
  3. Curve Fitting Toolbox
  4. System Identification Toolbox
  5. Wavelet Toolbox
  6. Econometrics Toolbox

The following time-series analysis packages are provided with the software (in the Toolboxes directory), and are used by our main feature extraction algorithms to compute meaningful structural features from time series:

Publications

Our publications

See the following publications for details of hctsa was developed and has since been extended, as well as some example applications:

  • Feature-based time-series analysis for a self-organizing living library of time-series data, CompEngine πŸ“— : B.D. Fulcher, C.H. Lubba, S.S. Sethi & N.S. Jones. CompEngine: A self-organizing, living library of time-series data. arXiv (2019). Link.
  • A reduced set of 22 efficiently coded features πŸ“— : C.H. Lubba, S.S. Sethi, P. Knaute, S.R. Schultz, B.D. Fulcher & N.S. Jones. catch22: CAnonical Time-series CHaracteristics. Data Mining and Knowledge Discovery 33, 1821 (2019). Link. Code.
  • Implementation paper introducing the hctsa package, with applications to high throughput phenotyping of C. Elegans and Drosophila movement time series πŸ“— : B.D. Fulcher & N.S. Jones. hctsa: A Computational Framework for Automated Time-Series Phenotyping Using Massive Feature Extraction. Cell Systems 5, 527 (2017). Link.
  • Introduction to feature-based time-series analysis πŸ“— : B.D. Fulcher. Feature-based time-series analysis. Feature Engineering for Machine Learning and Data Analytics, CRC Press, 87-116 (2018). Link, Preprint.
  • Application to fMRI data πŸ“— : S.S. Sethi, V. Zerbi, N. Wenderoth, A. Fornito, B.D. Fulcher. Structural connectome topology relates to regional BOLD signal dynamics in the mouse brain. Chaos 27, 047405 (2017). Link, preprint.
  • Application to time-series data mining πŸ“— : B.D. Fulcher & N.S. Jones. Highly comparative feature-based time-series classification. IEEE Trans. Knowl. Data Eng. 26, 3026 (2014). Link.
  • Application to fetal heart rate time series πŸ“— : B.D. Fulcher, A.E. Georgieva, C.W.G. Redman, N.S. Jones. Highly comparative fetal heart rate analysis. 34th Ann. Int. Conf. IEEE EMBC 3135 (2012). Link.
  • Original paper, showing that the behavior of thousands of time-series methods on thousands of different time series can provide structure to the interdisciplinary time-series analysis literature πŸ“— : B.D. Fulcher, M.A. Little, N.S. Jones. Highly comparative time-series analysis: the empirical structure of time series and their methods. J. Roy. Soc. Interface 10, 20130048 (2013). Link.

Other Publications

Here are some examples of external use of hctsa. Let me know if I've missed any!

  • Feature selection using genetic algorithms for fetal heart rate analysis. Paper
  • Evaluating asphalt irregularity from smartphone sensors. Paper
  • Assessing muscles for clinical rehabilitation. Paper
  • Detecting mild cognitive impairment using single-channel EEG to measure speech-evoked brain responses. Paper
  • Non-intrusive load monitoring for appliance detection and electrical power saving for buildings. Paper.
  • Classification of heartbeats measured using single-lead ECG. Paper.
  • Hand gesture recognition. Paper.

Acknowledgements

Many thanks go to Romesh Abeysuriya for helping with the mySQL database set-up and install scripts, and Santi Villalba for lots of helpful feedback and advice on the software.

Related resources

CompEngine

An accompanying web resource for this project is CompEngine, which allows users to upload and compare thousands of diverse types of time-series data. The vast and growing collection of time-series data can also be downloaded.

catch22

Is over 7000 just a few too many features for your application? Do you not have access to a Matlab license? catch22 has you all of your faux rhetorical questions covered. This reduced set of 22 features, determined through a combination of classification performance and mutual redundancy as explained in this paper, is available here as an efficiently coded C implementation with wrappers for python and R.

hctsa datasets

There are a range of open datasets with pre-computed hctsa features. (If you have data to share and host, let me know and I'll add it to this list):

Code for distributing hctsa calculations on a cluster

Matlab code for computing features for an initialized HCTSA.mat file, by distributing the computation across a large number of cluster jobs (using pbs or slurm schedulers) is here.

pyopy

This excellent repository allows users to run hctsa software from within python: pyopy.

hctsaAnalysisPython

Some beginner-level python code for analyzing the results of hctsa calculations is here.

Generating time-series data from synthetic models

A Matlab repository for generating time-series data from diverse model systems is here.

tsfresh

Native python time-series code to extract hundreds of time-series features, with in-built feature filtering, is tsfresh; cf. their paper.

tscompdata and tsfeatures

These R packages are by Rob Hyndman. The first, tscompdata, makes available existing collections of time-series data for analysis. The second, tsfeatures, includes implementations of a range of time-series features.

Khiva

Khiva is an open-source library of efficient algorithms to analyse time series in GPU and CPU.

pyunicorn

A python-based nonlinear time-series analysis and complex systems code package, pyunicorn.

About

Highly comparative time-series analysis code repository

https://hctsa-users.gitbook.io/hctsa-manual

License:Other


Languages

Language:MATLAB 49.5%Language:HTML 18.3%Language:Fortran 14.1%Language:C 10.0%Language:C++ 7.3%Language:Makefile 0.2%Language:Mathematica 0.2%Language:Gnuplot 0.2%Language:Objective-C 0.1%Language:Shell 0.1%Language:M4 0.0%Language:M 0.0%Language:CSS 0.0%Language:TeX 0.0%