Distributing hctsa calculations on a computing cluster

Code for distributing highly comparative time-series analysis computations, using hctsa, on a computing cluster using pbs or slurm using Matlab (without linking to a mySQL database).

A basic pipeline:

Set up a large HCTSA.mat file for your computation on your local machine using TS_Init.
Ensure that the hctsa version on your computing cluster is identical to the local version used to run TS_Init (otherwise results could be inconsistent).
Transfer the (uncomputed) HCTSA.mat file onto the cluster
Set the parameters tsMin, tsMax, and numPerJob in HCTSA_run.sh. These parameters determine how HCTSA.mat will be distributed into segments, each of which will be submitted as a cluster job.
Run HCTSA_run.sh in the parent directory which should contain the HCTSA.mat file. This will generate a set of directories containing subsets of time series. (NB: you may need to grant yourself permission to execute: chmod u+x HCTSA_run.sh)
When all computations are complete, stitch all the subsections of the main HCTSA file back together again using combineBatchFiles. This yields a fully computed HCTSA.mat file. 😄

benfulcher / distributed_hctsa

Distributing hctsa calculations on a computing cluster

About

Languages