DTW uses feature data instead of raw data

Question

DTW uses feature data instead of raw data

isaacngym opened this issue 5 years ago · comments

Hi,

Perhaps I'm not understanding your code properly, but I was hoping you could clear some things up. When performing DTW on the dataset you read in data/UCI-HAR-Dataset/train/X_train.txt - This is the feature-engineered dataset, I believe. It contains 561 columns instead of 128, which the README of the data indicates is the number of observations per window.

The part I don't understand is that the features are all dependent on the same observation window. (I checked this by adding an assertion in KnnDtw._dtw_distance()) In that case how can you perform DTW on them when there isn't a time-scale to warp? Or - more likely - am I misunderstanding DTW?

Eddie-yz · Answer 1 · Sun Jun 07 2020 13:03:06 GMT+0800 (China Standard Time)

I have also found this confused. I think DTW should work on raw time-series data, because its nature is to find an optimal alignment of two sequences across time, while here the data contained in X_train.txt and X_test.txt are features extracted from the raw data, they're just kind of flat representations of the time series. I think you might make a mistake here. Plz correct me if I'm wrong.

Jeroenjww · Answer 2 · Sat Feb 27 2021 08:45:50 GMT+0800 (China Standard Time)

I didn't understand the code as well due to the use of the feature data. I think it makes no sense to apply DTW here as the features are not directly time-independent. The code might be correct, but the usage not. The right predictions of the model are probably based on the fact that the graph of the features are most similar to the same class and therefore have the shortest path to that same classification.