benfulcher / distributed_hctsa

Running hctsa on a cluster (pbs or slurm)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Distributing hctsa calculations on a computing cluster

Code for distributing highly comparative time-series analysis computations, using hctsa, on a computing cluster using pbs or slurm using Matlab (without linking to a mySQL database).

A basic pipeline:

  1. Set up a large HCTSA.mat file for your computation on your local machine using TS_Init.
  2. Ensure that the hctsa version on your computing cluster is identical to the local version used to run TS_Init (otherwise results could be inconsistent).
  3. Transfer the (uncomputed) HCTSA.mat file onto the cluster
  4. Set the parameters tsMin, tsMax, and numPerJob in HCTSA_run.sh. These parameters determine how HCTSA.mat will be distributed into segments, each of which will be submitted as a cluster job.
  5. Run HCTSA_run.sh in the parent directory which should contain the HCTSA.mat file. This will generate a set of directories containing subsets of time series. (NB: you may need to grant yourself permission to execute: chmod u+x HCTSA_run.sh)
  6. When all computations are complete, stitch all the subsections of the main HCTSA file back together again using combineBatchFiles. This yields a fully computed HCTSA.mat file. 😄

About

Running hctsa on a cluster (pbs or slurm)


Languages

Language:Shell 56.3%Language:MATLAB 43.7%