pqy000 / SemiTimeSeries

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Semi Time series classification

The main idea is to combine the MeanTeacher with the series saliency module. While improving the accuracy of the model, it can also enhance the interpretability quantitatively and qualitatively. Compared with the above work that only improves accuracy, it may provide more insights.

image

Environment

  • sklearn 0.22.1
  • numpy 1.16.4
  • pytorch >= 1.7
  • torchgeometry 0.1.2

Dataset

The data composed of 6 publicly available datasets downloadable from (Download link), and save them under the directory datasets/. The following are the detailed parameters of the three data sets I have completed the experiment.

Dataset Train Test Dimension Class
UWaveGestureLibraryAll 2688 894 945 8
CricketX 458 156 300 12
InsectWingbeatSound 1320 440 256 11

Structure

mainOurs.py includes some options. The following are the important options, the python script takes along with their description:

option

  • --dataset

    • The experiments include 16 datasets. The previous papers mainly design experiments on the 6 datasets. Up until now, for each dataset, I ran for 5 times (random seed 0,1,2) and recorded the mean and variance. As shown in the experiments, there is a significant improvement compared with the previous SOTA results.
  • --model_name

    • It includes three opinions.
      • SupCE: The supervised training procedure
      • SemiTime: The previous SOTA baselines
      • VT2: Our method ( VT2 + Series Saliency).
  • --label_ratio

    • The option is used to limit the proportion of labeled data.
  • --Saliency

    • The option is to indicate whether using series saliency module in the MeanTeacher.
  • Other parameters are some detailed parameters.

Directory

  • optim/
    • Under the optim/ directory, there are several semi supervised learning method.
      • generalWay.py includes our implement method
      • pretrain.py includes the baseline
  • model/
    • The mainly DL architecture is Temporal Convolution neural network
  • Dataloader/
    • The directory is necessary, including some data loaders that read the UCR time series classification data. The data used to calculate the consistency loss sample from both labelled and unlabeled data in our implementation.

Usage example

After introducing the results of previous code, some examples for running commands.

## MeanTeacher
python mainOurs.py --model_name VT2 --dataset=CricketX --gpu=2 --label_ratio 0.4
## Semi time Method
python mainOurs.py --model_name SemiTime --dataset=CricketX --gpu=2 --label_ratio 0.4
## Supervised method
python mainOurs.py --model_name SupCE --dataset=CricketX --gpu=2 --label_ratio 0.4

Architecture

The model architecture is intuitive, which used VT2 method to the semi-supervised learning of time series. We combine it with the previously proposed series saliency module. As shown in the figure, we can guess the design idea of the model. The implementation details are in code. At present, the algorithm significantly improves accuracy. On the other hand, we validated the series saliency module is helpful in semi-supervised learning. This is good news! ๐ŸŽ‰ ๐ŸŽ‰ ๐Ÿ˜„

The second part is to use the series saliency for interpretation in time series semi-supervised learning. I will implement the codes, and migrate from time series forecasting to time series classification. We'll provide more quantitative and qualitative analysis. The motivation is to observe learning procedure with increasing label size. The phenomenon may require more domain knowledge and cherry-pick some visualization.

Finally, I think that easy-to-implementation series saliency can significantly improve prediction accuracy and interpretability, contributing to the time series semi-supervised learning!

Experiments results

We mainly compare the latest two papers on time series supervised learning. The second paper reproduces the results of the first paper. There are some other baseline methods are implemented in second paper.(like Pi model or pseudo label). I only put the strongest baseline in the table. Therefore, we will compare their methods. The experiment results show that the series saliency is also an effective augmentation. Now the more visualization results will be added (like t-sne).


Label Ratio 10% 20% 40% 100%
Dataset ChinaTown
SemiTime 44.88 (3.13) 51.61 (1.22) 58.71 (2.78) 65.66 (1.58)
MeanTeacher 45.54 (1.16) 51.59 (1.98) 62.87 (1.69) 67.32(0.12)
MT w/ SS 47.31(2.21) 53.87(1.12) 63.45 (1.28) 69.31(0.11)
Dataset MFPT
SemiTime 54.96 (1.61) 59.01 (1.56) 62.38 (0.76) 66.57 (0.67)
MeanTeacher 56.33 (2.1) 61.21 (2.17) 63.37(0.92) 67.53(1.98)
VT2 w/ SS 57.24 (2.27) 61.47 (1.91) 64.9 (2.1) 68.99(1.98)
Dataset Epilep
SemiTime 81.46(0.60) 84.57(0.49) 86.91(0.47) 90.29(0.32)
MeanTeacher 91.92 (1.52) 92.11(0.32) 94.37(0.30) 95.13(0.21)
VT2 w/ SS 92.28 (0.51) 94.94(0.68) 96.36(0.71) 97.11(0.11)
Dataset MFPT
SemiTime 64.16(0.85) 69.84(0.94) 76.49 (0.54) 84.33(0.50)
MeanTeacher
MT w/ SS
Dataset Epilep
SemiTime 74.86(0.42) 75.54(0.63) 77.01(0.79) 79.26(1.20)
MeanTeacher
MT w/ SS

Reference

The two papers were not presented at the top-tier conference. I think the main reason is the lack of further analysis for semi-supervised learning.

[1]SEMI-SUPERVISED TIME SERIES CLASSIFICATION BY TEMPORAL RELATION PREDICTION

[2]Self-Supervised Time Series Representation Learning by Inter-Intra Relational Reasoning

[3]Self-supervised Learning for Semi-supervised Time Series Classification

About


Languages

Language:Python 100.0%