angus924 / minirocket

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

datatype

dfx1822375 opened this issue · comments

commented

when i use my data with minirocket in pycharm , had a problem with dataype, like: Traceback (most recent call last):
File "E:/PycharmProjects/minirocket-main/code/traintest.py", line 46, in
parameters = fit(X_training)
File "E:\PycharmProjects\minirocket-main\code\minirocket.py", line 130, in fit
biases = _fit_biases(X, dilations, num_features_per_dilation, quantiles)
File "E:\ProgramData\Anaconda3\envs\deepl\lib\site-packages\numba\core\dispatcher.py", line 703, in _explain_matching_error
raise
TypeError(msg)TypeError: No matching definition for argument type(s) pyobject, array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)
how can i work it

Hi @dfx1822375.

Thanks for your message.

I'm not 100% sure what the problem is from the information you have provided. It might be that something isn't quite right with the format of your data, X_training.

Could you check the shape and dtype of X_training?

X_training should have two dimensions (each row is a time series), and the dtype should be np.float32.

Thanks very much.

commented

row

Hi @angus924
I just modified the program and successfully ran the code, but I found that the validation accuracy is low for my own data set, only about 50%, is there any way to adjust the accuracy? And how can I use it on GPU? Thanks!

Hi @dfx1822375.

The method won't necessarily work well on every dataset.

If you could tell me a little bit more about your data (how many time series, time series length, number of classes, etc), maybe show a plot of a couple of the time series, I can think about whether there's anything that we might be able to improve. However, I'm not sure, the method may simply not work very well with your data.

Re GPU, there is a PyTorch implementation available in the tsai library, here: MINIROCKET_Pytorch.py.

I hope that helps.

commented

Hi @dfx1822375.

The method won't necessarily work well on every dataset.

If you could tell me a little bit more about your data (how many time series, time series length, number of classes, etc), maybe show a plot of a couple of the time series, I can think about whether there's anything that we might be able to improve. However, I'm not sure, the method may simply not work very well with your data.

Re GPU, there is a PyTorch implementation available in the tsai library, here: MINIROCKET_Pytorch.py.

I hope that helps.
Hi @angus924 .

The data is the amplitude of some wifi signals, the number of samples is 7500, the original dimension is 7500x3x30x200, after listening to your suggestion I reshape to 7500x18000, so the length of the sequence is 18000. the samples will be a 150 kinds of action classification, I use 7000 as the training set, and the remaining 500 as the validation set. And I would like to know what is the dimensionality of the features extracted by mimirocket, can I understand the extracted features and design a relevant classifier to improve the accuracy?
As for GPU, I have seen tsai before, but the examples I have seen are about UCR dataset, I don't know how to build my own dataset so I didn't use it, if you know anything about it, I would appreciate your sharing.
commented

Hi @dfx1822375.

The method won't necessarily work well on every dataset.

If you could tell me a little bit more about your data (how many time series, time series length, number of classes, etc), maybe show a plot of a couple of the time series, I can think about whether there's anything that we might be able to improve. However, I'm not sure, the method may simply not work very well with your data.

Re GPU, there is a PyTorch implementation available in the tsai library, here: MINIROCKET_Pytorch.py.

I hope that helps.

The data is the amplitude of some wifi signals, the number of samples is 7500, the original dimension is 7500x3x30x200, after listening to your suggestion I reshape to 7500x18000, so the length of the sequence is 18000. the samples will be a 150 kinds of action classification, I use 7000 as the training set, and the remaining 500 as the validation set. And I would like to know what is the dimensionality of the features extracted by mimirocket, can I understand the extracted features and design a relevant classifier to improve the accuracy?
As for GPU, I have seen tsai before, but the examples I have seen are about UCR dataset, I don't know how to build my own dataset so I didn't use it, if you know anything about it, I would appreciate your sharing.

Hi @dfx1822375, thanks very much for the additional information. Sorry for the slow response.

Ok, so there are 7500 samples (1st dimension of the data). What are the 2nd and 3rd dimensions? I assume the last (4th) dimension is the time dimension, is this correct?

It looks like these are multivariate time series, so my guess is that the multivariate version of MiniRocket (minirocket_multivariate.py) will work better, but I don't know for sure. This version of the code will work for time series with multiple channels, i.e., [num_samples, num_channels, time].

Depending on what the 2nd and 3rd dimensions of your data actually represent, it might make sense to reshape the data to be in the shape, e.g., [7500, 90, 200]. Could you show me a plot of one of the samples? That might help me to understand what would be a good approach.

MiniRocket will always (by default) output 9,996 features for each time series (so your output would be in the shape [7000, 9996] for the training set. (This number, 9,996, represents the closest multiple of 84 to 10,000. We want approximately 10,000 features, and we have 84 kernels. The number of features will always be a multiple of 84.)

In relation to tsai, you can use data in NumPy format, here are some examples: [1], [2].

commented

Hi @dfx1822375, thanks very much for the additional information. Sorry for the slow response.

Ok, so there are 7500 samples (1st dimension of the data). What are the 2nd and 3rd dimensions? I assume the last (4th) dimension is the time dimension, is this correct?

It looks like these are multivariate time series, so my guess is that the multivariate version of MiniRocket (minirocket_multivariate.py) will work better, but I don't know for sure. This version of the code will work for time series with multiple channels, i.e., [num_samples, num_channels, time].

Depending on what the 2nd and 3rd dimensions of your data actually represent, it might make sense to reshape the data to be in the shape, e.g., [7500, 90, 200]. Could you show me a plot of one of the samples? That might help me to understand what would be a good approach.

MiniRocket will always (by default) output 9,996 features for each time series (so your output would be in the shape [7000, 9996] for the training set. (This number, 9,996, represents the closest multiple of 84 to 10,000. We want approximately 10,000 features, and we have 84 kernels. The number of features will always be a multiple of 84.)

In relation to tsai, you can use data in NumPy format, here are some examples: [1], [2].

Hi @angus924 :
First of all, I would like to express my sincere appreciation and admiration for your patience and sense of responsibility.
The second dimension is the number of antennas of the wifi transmitter, as you know the wifi ap has multiple antennas, here we use three transmitting antennas, and the third dimension of 30 represents the OFDM technology that will be used in the wifi transmission process, which will generate 30 subcarriers, so each sample will have a dimension like three antennas x 30 subcarriers x sequence length 200.
Here is the data I took out of a single channel of a single sample for plotting, so it is 30x200:
image
This is a sample of the conversion to 90x200 at your suggestion:
image
Then the precision is 76 after using the multivariate program.
Is there any way I can improve the accuracy? In addition to that I would like to know the exact structure picture of the multivariate minirocket as I have not seen it in the paper.
Thank you for your guidance, as for the use of GPU I think I need to learn some more time

Thanks very much @dfx1822375. Thanks for the extra information and the plots, they are very useful.

This is very interesting.

I suppose, ideally, you might have kernels where the shape of the kernel matched the structure of the input, e.g., kernels with shape [3, 30, 9] (for kernels of length 9). Another option would be to do a group / depthwise convolution (groups = 3), applying a different (multichannel) kernel to each of the 3 antenna dimensions.

Neither of those things are possible in MiniRocket without modifying the code quite a bit. However, it would be very straightforward to try this with a simple 1d cnn in PyTorch, for example. It might be worth trying.

Anyway, I have one more suggestion. Before reshaping the input to [7500, 90, 200], you could try standardising / normalising (i.e., subtracting the mean and dividing by the standard deviation, or just dividing by the standard deviation) across the antenna dimension. Something like this:

_mean = X_training.mean(2, keepdims = True) # shape = [7000, 3, 1, 200]
_std = X_training.std(2, keepdims = True) + 1e-8 # shape = [7000, 3, 1, 200]

X_training = (X_training - _mean) / _std
X_test = (X_test - _mean) / _std

X_training = X_training.reshape(7000, 90, 200)
# etc

In terms of how the multivariate version works... You're right, it's not explained in the paper, it's something we put together afterwards, just to make sure that there was something available for people to use. It's very basic, and can probably be improved.

It works like this. With a multivariate dataset, each time series has 2 or more channels (e.g., maybe 500 channels). In MiniRocket, we have a basic set of 84 kernels, and each kernel is applied with multiple dilation values. We assign a random subset of channels (e.g., channels [14, 56, 257]) to each kernel/dilation combination (e.g., kernel W_0 and dilation d=1). Then, for that particular kernel/dilation combination, we add the assigned channels together to create, in effect, a univariate time series. This is equivalent to applying the same kernel (i.e., kernel W_0 in our example) to each of the relevant channels.

commented

Thanks very much @dfx1822375. Thanks for the extra information and the plots, they are very useful.

This is very interesting.

I suppose, ideally, you might have kernels where the shape of the kernel matched the structure of the input, e.g., kernels with shape [3, 30, 9] (for kernels of length 9). Another option would be to do a group / depthwise convolution (groups = 3), applying a different (multichannel) kernel to each of the 3 antenna dimensions.

Neither of those things are possible in MiniRocket without modifying the code quite a bit. However, it would be very straightforward to try this with a simple 1d cnn in PyTorch, for example. It might be worth trying.

Anyway, I have one more suggestion. Before reshaping the input to [7500, 90, 200], you could try standardising / normalising (i.e., subtracting the mean and dividing by the standard deviation, or just dividing by the standard deviation) across the antenna dimension. Something like this:

_mean = X_training.mean(2, keepdims = True) # shape = [7000, 3, 1, 200]
_std = X_training.std(2, keepdims = True) + 1e-8 # shape = [7000, 3, 1, 200]

X_training = (X_training - _mean) / _std
X_test = (X_test - _mean) / _std

X_training = X_training.reshape(7000, 90, 200)
# etc

In terms of how the multivariate version works... You're right, it's not explained in the paper, it's something we put together afterwards, just to make sure that there was something available for people to use. It's very basic, and can probably be improved.

It works like this. With a multivariate dataset, each time series has 2 or more channels (e.g., maybe 500 channels). In MiniRocket, we have a basic set of 84 kernels, and each kernel is applied with multiple dilation values. We assign a random subset of channels (e.g., channels [14, 56, 257]) to each kernel/dilation combination (e.g., kernel W_0 and dilation d=1). Then, for that particular kernel/dilation combination, we add the assigned channels together to create, in effect, a univariate time series. This is equivalent to applying the same kernel (i.e., kernel W_0 in our example) to each of the relevant channels.

Hi @angus924 :
I don't quite understand your first two paragraphs, we know that in pytorch the parameters of the convolution kernel are (number of input channels, number of output channels, convolution kernel size), according to this my one sample dimension is 3X30X200, i.e. input three channels, maybe the last two dimensions are seen as length and width? Then the corresponding convolution kernel should be (3, number of output channels, convolution kernel size), so I don't quite understand your statement "you might have kernels where the shape of the kernel matched the structure of the input ".
As for normalization, I think you are talking about normalizing the entire initial data and then slicing it into a training set and a test set, not the way you programmed it, after all, the training set and the test set are not the same size. However, unfortunately I found that my accuracy did not improve after normalization but decreased to some extent, whether I normalized in the second dimension or the first dimension, and I am puzzled by this.
I have a general understanding of how multi-channel works, but I would like to ask if there is a drawn network structure for minirocket networks, such as VGG, Resnet, etc. You know that such pictures are easier for us to understand the network.
Thank you for your reply!

Hi @dfx1822375.

You are quite right, that was my stupid mistake. You would need to get the mean and/or standard deviation per antenna per sample, so something like the following. (This would produce the same result whether the data was normalised before or after splitting into training / validation / test.)

X_training /= X_training.std(2, keepdims = True) + 1e-8
X_test /= X_test.std(2, keepdims = True) + 1e-8

# etc

If both subtracting the mean and dividing by the standard deviation is making accuracy worse, you could try just subtracting the mean (without dividing by the standard deviation), or just dividing by the standard deviation (without subtracting the mean). It may not make any difference. If normalising along these lines does not improve accuracy, or makes accuracy worse, it might suggest that there is information relevant to the class in the mean and/or scale of the data (which gets destroyed by normalisation).

Sorry, I wasn't clear about the kernels. What I meant was, e.g., if you were using the PyTorch conv1d function or equivalent, that you would have a tensor representing your kernels in the shape [num_kernels, anntenna_dimension (3), subcarrier_dimension (30), kernel_length]. The annenta dimension and/or the subcarrier dimension could use all channels (i.e., 3 and 30 respectively), or you could subsample the channels, roughly in the same way that channels are subsampled in MiniRocket, for example. Does this make sense?

I don't have a diagram for multivariate MiniRocket at the moment, sorry. If I get the chance to make one, I'll share it with you.

commented

Hi @dfx1822375.

You are quite right, that was my stupid mistake. You would need to get the mean and/or standard deviation per antenna per sample, so something like the following. (This would produce the same result whether the data was normalised before or after splitting into training / validation / test.)

X_training /= X_training.std(2, keepdims = True) + 1e-8
X_test /= X_test.std(2, keepdims = True) + 1e-8

# etc

If both subtracting the mean and dividing by the standard deviation is making accuracy worse, you could try just subtracting the mean (without dividing by the standard deviation), or just dividing by the standard deviation (without subtracting the mean). It may not make any difference. If normalising along these lines does not improve accuracy, or makes accuracy worse, it might suggest that there is information relevant to the class in the mean and/or scale of the data (which gets destroyed by normalisation).

Sorry, I wasn't clear about the kernels. What I meant was, e.g., if you were using the PyTorch conv1d function or equivalent, that you would have a tensor representing your kernels in the shape [num_kernels, anntenna_dimension (3), subcarrier_dimension (30), kernel_length]. The annenta dimension and/or the subcarrier dimension could use all channels (i.e., 3 and 30 respectively), or you could subsample the channels, roughly in the same way that channels are subsampled in MiniRocket, for example. Does this make sense?

I don't have a diagram for multivariate MiniRocket at the moment, sorry. If I get the chance to make one, I'll share it with you.
Hi @angus924 :
As you might expect, doing just one step of normalization doesn't make much difference.
And with 1D convolution, I understand the parameters as follows: torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True), you mean to use 1D convolution on the original data instead of using minirocket, right? Does it mean that the features extracted by minirocket only fit the original signal by about 70% according to the current accuracy? Should I try to build a network to further learn the extracted features?
Also, you mentioned that 9996 features were extracted, which correspond to 84 extraction kernels, so each kernel corresponds to 119 features, I would like to know how these features are arranged in the result, whether 0 to119 features belong to the first kernel and 120 to 238 belong to the second; or 0 to 119 features correspond to one feature of each kernel, and 120 to 238 belong to the second feature of each feature kernel?
Thank you very much for saying that you will plot, can you plot both the normal minirocket and multivariate minirocket networks if it is convenient? Thank you for your reply!

Hi @dfx1822375, sorry again for the delay.

And with 1D convolution, I understand the parameters as follows: torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True), you mean to use 1D convolution on the original data instead of using minirocket, right?

Yes, here I was referring to configuring a cnn instead of using MiniRocket. The dimensions I was referring to were for the weight tensor when passed to torch.nn.functional.conv1d(input, weight, ...). I'm not sure whether it's possible to get the right shape when using torch.nn.Conv1d(...), because it's not really a 'standard' shape.

Does it mean that the features extracted by minirocket only fit the original signal by about 70% according to the current accuracy? Should I try to build a network to further learn the extracted features?

Yes, based on what you've told me, it looks like the accuracy of MiniRocket on this data is about 70%. I think it's worth trying a simple cnn, just to compare.

Also, you mentioned that 9996 features were extracted, which correspond to 84 extraction kernels, so each kernel corresponds to 119 features, I would like to know how these features are arranged in the result, whether 0 to119 features belong to the first kernel and 120 to 238 belong to the second; or 0 to 119 features correspond to one feature of each kernel, and 120 to 238 belong to the second feature of each feature kernel?

Good question. I'm pretty sure the features are arranged like this for each time series:

dilation_0
  kernel_0
    bias_0
    bias_1
    bias_2
    [...]
    bias_n
  kernel_1
    bias_0
    [...]
  [...]
dilation_1
  kernel_0
    bias_0
    [...]
  [...]
[...]

Does that make sense? There are more features for smaller dilations and fewer features for larger dilations. The number of dilations depends on the length of the input time series, so the number of features per kernel depends on the dataset. It will be different for different datasets.

The exact number is returned by the fit(...) method. Usually just for convenience we use fit(...) like this:

parameters = fit(X)
X_transform = transform(X, parameters)
# etc

However, parameters is a tuple of (dilations, num_features_per_dilation, biases). The num_features_per_dilation parameter is an array of length num_dilations, and each entry corresponds to the number of features per kernel for that dilation. That will allow you to work out which features correspond to which dilation/kernel/bias value, etc.

Thank you very much for saying that you will plot, can you plot both the normal minirocket and multivariate minirocket networks if it is convenient? Thank you for your reply!

I haven't had the chance to do this yet, I'm not sure when I'll get to it. Hopefully soon.

Thanks very much.

commented

Hi @angus924
Thank you very much for your reply, I already know the composition of the output features, I would like to ask, if I divide the features according to each kernel, that is, the 9,996 features are divided into a matrix of shape (84,119) according to their respective kernels, and then perform feature learning will be better than learning 9,996 features directly?