Accuracy problem

Question

Accuracy problem

abolfazlzamanii opened this issue 10 months ago · comments

I am trying to RUN MiniRocket. The accuracy I get is 85% and it is 2% different from your accuracy. I have used the code you provided and defined only 109 UCR datasets for it. You can see the relevant code in the link below. What is the reason for this 2% difference?
https://colab.research.google.com/drive/1YcrWTSF7oNqGeP-C0n-pdi2EzAqYo-_g?usp=sharing

angus924 · Answer 1 · Thu Aug 03 2023 18:53:30 GMT+0800 (China Standard Time)

Hi @abolfazlzamanii, thanks for your question.

What are you referring to specifically? Is '85%' the accuracy on a particular dataset, or mean accuracy over all datasets?

When you say 'your accuracy', what are you referring to? There are a number of results files on the github. The results in results_ucr109_mean.csv represent mean accuracy over 30 resamples of each dataset. The accuracies on different resamples can be quite different. The results in accuracy_ucr109_resamples.csv show the results for each resample separately. If you are just running the method on the original training/test split for each dataset, I would expect the results to match resample 0 (i.e., the first column) in accuracy_ucr109_resamples.csv, as this corresponds to the original training/test split.

abolfazlzamanii · Answer 2 · Thu Aug 03 2023 19:37:12 GMT+0800 (China Standard Time)

I ran the program 5 times for all 109 datasets and got an average accuracy of 85% for all 109 datasets. Then I compared with the values presented in the file (results_ucr109_mean) that the average value of the total vote of 109 datasets was equal to 87%.

angus924 · Answer 3 · Thu Aug 03 2023 20:35:17 GMT+0800 (China Standard Time)

Ok, so as I said the results in results_ucr109_mean.csv are mean results over different resamples of the data. This is not the same as what you have done. If I understand correctly, you have used the original training/test split for each dataset.

The results in the first column of accuracy_ucr109_resamples.csv represent the accuracy of the method on the original training/test split for each dataset. The average of these values ~0.8567. If you ran the method multiple times on this version of the datasets you would get slightly different results, but I would expect the mean accuracy value to be close to this figure.

In other words, the difference is based on the datasets the method has been trained (and tested) on, specifically, whether or not you are using the original training/test splits, or using resamples.