Channels get lost

Question

Channels get lost

lgalarra opened this issue a month ago · comments

Dear all,

I am applying Minirocket to a set of multivariate series with 7 channels and 8020 data points.

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from minirocket_multivariate import fit, transform
import numpy as np

# Asume que df_final_extendedVelocidadAnglarDerivadaRadio ya está preparado con tus datos
X = df_final_extendedVelocidadAnglarDerivadaRadio.drop(columns=['etiqueta']).values
y = df_final_extendedVelocidadAnglarDerivadaRadio['etiqueta'].values

# Reforma X para que tenga la forma esperada por MiniRocket
X_reshaped = X.reshape(-1, 7, 8020)
X_reshaped = X_reshaped.astype(np.float32)

# Ajusta los parámetros de MiniRocket
params = fit(X_reshaped, num_features=100, max_dilations_per_kernel=84)

# Transforma los datos usando MiniRocket
X_transformed = transform(X_reshaped, params)

When I print X_reshaped.shape I get: (240, 7, 8020)

However the transformation using minirocket_multivariate.fit() returns a X_transformed with dimensions (240, 84). I would have expected (240, 7, 84). Is my assumption correct? If so, am I doing anything wrong? Your help will be highly appreciated.

Best,
Luis

angus924 · Answer 1 · Wed May 15 2024 08:20:07 GMT+0800 (China Standard Time)

Hi @lgalarra, thanks for your question, sorry for the late reply.

If I understand the situation correctly, the minirocket output is expected. The way it works is that minirocket convolves the input with a set of convolutional kernels, and then applies global pooling (ppv) over the convolution output for each kernel. For multivariate input, where you have multiple channels per time series, minirocket performs convolution with a multivariate kernel, essentially combining the information from multiple channels (each kernel is assigned a random subset of channels). So the number of output features will be the same, regardless of the number of channels. In other words, the features are not per channel, but rather per time series.

I note that the default number of features is ~10,000. I would expect the performance to be fairly poor with only ~100 features.

I hope that helps a bit?

Let me know if you have any further questions.

Best,

Angus

Luis Galárraga · Answer 2 · Thu May 16 2024 03:50:06 GMT+0800 (China Standard Time)

Hi Angus,

Thank you very much for your prompt reply. It is very clear now. Is there a place where the Minirocket's implementation on multivariate series is explained more in detail?

Kind regards,
Luis

angus924 · Answer 3 · Thu May 16 2024 11:28:14 GMT+0800 (China Standard Time)

Hi Luis,

Not really, unfortunately, but maybe I can give a quick overview.

Basically, for each kernel, we randomly assign a subset of channels (between 1 and 9 channels) to each kernel/dilation combination. Once assigned, this combination is fixed (i.e., so the same channels are used for the given kernel/dilation combination every time you call "transform", i.e., it's the same on the training and test sets).

Then, when we go to perform the convolution operation for a given kernel/dilation, we add all those channels together. This is equivalent to using the same kernel on each channel, i.e., a multivariate kernel where each channel in the kernel has the same weights.

Here's the relevant code link:

C = C_alpha[channels_this_combination] + \
    C_gamma[index_0][channels_this_combination] + \
    C_gamma[index_1][channels_this_combination] + \
    C_gamma[index_2][channels_this_combination]
C = np.sum(C, axis = 0)

First statement is selecting the relevant channels, second statement is adding them all together.

So in this way we combine information from multiple channels.

The advantage of randomly selecting them is, even if we don't know which channels are important or unimportant, we should get a good spread of information across all the channels by spreading them out over all the kernels (unless there's a truly massive number of channels, you'll probably get good coverage, even if only a small number of channels is assigned to a particular kernel), you can potentially pick up on different interactions between channels by having all sorts of different combinations, and overall it's pretty efficient, as you're only ever dealing with a small number of channels per kernel. (Having said that, the multivariate version of the code is slower than the regular/univariate version for reasons I've never been able to figure out.)

Hope that helps a bit!

Let me know if you have any more questions.

Best,

Angus

Luis Galárraga · Answer 4 · Thu May 16 2024 20:29:56 GMT+0800 (China Standard Time)

Dear Angus,

Thanks for the detailed explanation. That helps a lot!

Kind regards,
Luis