benfulcher / hctsa

Highly comparative time-series analysis

Home Page:https://time-series-features.gitbook.io/hctsa-manual/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multivariate time series analysis

smetanadvorak opened this issue · comments

Hello,

Thank you for such an amazing tool. I'm wondering about how to approach multivariate time series using hctsa. Is there a special way of assigning the keywords before using TS_LabelGroups?

Thank you in advance,
Konstantin

Hi Konstantin. No worries, and thanks for your interest ☺️
Depending on your problem, you can use hctsa for multivariate time series in different ways. Probably you'll want to incorporate some measures of coupling between time series (which is not part of hctsa). How best to assign labels to time series using TS_LabelGroups depends on the problem. If you give me more info, perhaps I can give more specific advise...

Thanks for quick answer Ben,
I'm working on a two-class classification problem were each data instance contains 8-channel EMG. Let's assume that I use only one-channel features (no coupling between TS). How I see it: I extract the features in each of 8 channels, then I stack them in one feature vector. Then I do all the necessary to run TS_TopFeatures and ... get the top feature-channel pairs :)
Well, after having it described, I see that this approach cannot be realised by means of some smart labelling. I probably should look for top features for each channel disjointly.

Yes, you could combine as you say (expanding columns), but that would involve some manual work if you want to hack it for use within the hctsa architecture. The simple case of being channel-blind (i.e., just labeling all rows) could be done easily with TS_TopFeatures. Might give you an indication of how much signal is in your data.
If you do go down the approach of expanding columns as all feature-channel pairs, you'll likely need to reduce the number of features (will depend on the size of your dataset, but is typically hard to constrain a learning problem containing 8x7000~56000 features!) Many approaches to doing this; could be data-driven on your data (e.g., dimensionality reduction, or some hard clustering), or using an independent, high-dimensional dataset (e.g., the Empirical1000 set).

Thank you Ben, sorry for late answer! Well, I'll think about it and tell here if come up with a solution.
All the best.