WEASEL+MUSE large number of features

Question

WEASEL+MUSE large number of features

wwjd1234 opened this issue 3 years ago · comments

Description

When using WEASELMUSE for multivariate time series classification the result of the trandformer give a very large number of features 650,000. Also, the number of counter in the histogram is sometime zero for some examples. Is this expected?

Steps/Code to Reproduce

I used my own data set the result of X_weasel was an ndarray size 1500 x 650000.
The 1500 makes sense as this is the number of examples I had, but the 650000 seems large.
I use the following code below. Also, when using the same code in the example when loading basic motions I get similar results. Large number of feature and some examples with all zeros. Thus if I plot the histogram there is nothing to plot.

    transformer = WEASELMUSE(strategy='uniform', word_size=4, window_sizes=np.arange(5, 70), sparse=False)
    X_weasel = transformer.fit_transform(X_train, y_train)

Versions

NumPy 1.20.3
SciPy 1.6.3
Scikit-Learn 0.24.2
Numba 0.53.1
Pyts 0.11.0

Johann Faouzi · Answer 1 · Tue Jun 15 2021 03:48:07 GMT+0800 (China Standard Time)

Actually it's not that surprising, because this algorithm (in a nutshell) mainly consists in extracting many features and filtering in the best ones. Also, if the window size is very large (compared to the number of time points), the algorithm can only extract a very small number of subsequences for this window size, and since this algorithm counts the number of words (each subsequence is transformed into a word), the number of non-zero values is very small, while the number of features is very large.

You have two main approaches to decrease the number of features:

Obtaining an array with fewer features by changing the values for some arguments: decreasing the size of the alphabet (word_size) or the number of windows considered (window_sizes), or increasing the threshold for the chi2 statistics (chi2_threshold). In particular, I think that considering every window size between two ranges is not necessary, because you will probably extract very correlated features.
Decreasing the number of features by using an estimator from scikit-learn to perform feature selection. There are several approaches that are well described in the documentation. There is also another tutorial in which correlated features are removed, which could be relevant.

Hope this helps you a bit.

wwjd1234 · Answer 2 · Tue Jun 15 2021 03:56:55 GMT+0800 (China Standard Time)

Thank you for the explanation. I was able to reduce the feature space size by adjusting values as you mentioned, decreasing word size, and setting 2 values for the window instead of a range. This does however affect the model accuracy for classification. I found that when I keep the 650,000 features I get excellent accuracies but lower other wise.

Johann Faouzi · Answer 3 · Thu Jun 17 2021 04:07:10 GMT+0800 (China Standard Time)

Great if it's working well with the first set of values for the hyper-parameters.

I don't know if it's necessary to mention it, but it's mandatory to perform cross-validation to evaluate a model: it's really easy to overfit any machine learning algorithm on a dataset of 1,500 samples and 650,000 features.