mistaro / tsad

Package for Time Series Forecasting and Anomaly Detection Problems.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Downloads Downloads License

Time Series Anomaly detection.

The primary purpose of the TSAD (Python module) is to make life easier for researchers who use deep learning techniques for time series.

image-2

In particular, TSAD is created for solving Time Series Anomaly Detection problem by widely known technique:

  • Forecast a multivariate Time Series (TS) one point ahead (Also works for univariate TS)
  • Compute residuals between forecast and true values
  • Apply analysis of residuals and thus find anomalies

The functionality of the TSAD:

  • Preprocessing of Time Series (tsad.src module):

    • Automatic search for gaps or groups of missing values and solving this problem (tsad.src.df2dfs)

    • Conversion to a single sample rate or solving unevenly spaced time series problem n(tsad.src.df2dfs)

    • Splitting the entire history dataset, that is, one large time series, into a train and a test (tsad.src.ts_train_test_split) with a specific length of time series in one sample. Also, you can adjust the step, intersection of samples, and much more.

    • Collecting samples in batches by using a Loader (tsad.src.Loader)

  • Forecasting multi-step ahead both multivariate and univariate time series. As forecasting algorithms were implemented or will be implemented tsad.models:

    • A simple one-layer LSTM network (LSTM)
    • A two-layer LSTM network (DeepLSTM)
    • bi-directional LSTM network (BLSTM)
    • LSTM encoder-decoder (EncDec-AD)
    • LSTM autoencoder (LSTM-AE)
    • Convolutional LSTM network (ConvLSTM)
    • Convolutional Bi-directional LSTM network (CBLSTM)
    • Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED)
  • Calculation of residuals between forecast and real values. By default, the absolute difference is calculated. Still, you can write your function taking into account the requirements (requirements and other functions for calculating the residuals can be found in tsad.generate_residuals) and use it in the pipeline.

  • Residual analysis to find anomalies. There are various techniques for analyzing residuals. By default, T2 statistic is implemented , but you can write your function taking into account the requirements (requirements and other functions for analyzing residuals can be found in tsad. stastics) and use it in the pipeline.

  • Grouping of repeated time series values. tsad.src.split_by_repeated

  • Convenient loading of hyperparameters. tsad.useful.iterators.MeshLoader

Documentation

https://tsad.readthedocs.io/

The main class of the pipeline is tsad.main.AnomalyDetection


Getting Started

Installation through PyPi:

pip install -U tsad

  1. Primitive case
import pandas as pd
from tsad import main

# Loading ideal time series without any problem
df = pd.read_csv('example.csv',parse_dates='DT').set_index('DT') 

pipeline = maim.AnomalyDetection() 
pipeline.fit(df)
list_anomalies = pipeline.predict_anomaly(df)
forecast = pipeline.forecast(df)
  1. Advanced case
import pandas as pd
from tsad import main
from tsad.src import df2dfs
import torch

# Loading time series
df = pd.read_csv('example.csv',parse_dates='DT').set_index('DT') 

class my_preproc_func(...):
    ...
    

pipeline = maim.AnomalyDetection(preproc=my_preproc_func) 
pipeline.fit(df2dfs(df),
             n_epochs=3,
             optimiser=(torch.optim.Adam,{‘lr’:0.001}),
             batch_size=8,
             len_seq=60*3,
             test_size=0.4)

After that, you can see:

image-1

And then you can perform:

list_anomalies = pipeline.predict_anomaly(df2dfs(df))
forecast = pipeline.forecast(df2dfs(df))

More details you can find here


Thoughts

We encourage the community also to provide feedback on the desired functionality.

We plan to implement:

  1. More complex preprocessing of time series, especially in the area of reduction to a single sampling rate (problem of unevenly spaced time series)

  2. Implement other SOTA algorithms

  3. The ability to implement any model in our pipeline by just providing a link to GitHub. It seems to be a handy feature as many researchers need to verify their models with others.

  4. Integration with most forecasting and anomaly detection benchmarks.

Some interesting links:

  1. https://github.com/salesforce/Merlion
  2. https://github.com/fastforwardlabs/deepad
  3. https://github.com/HendrikStrobelt/LSTMVis
  4. https://github.com/TezRomacH/python-package-template
  5. https://github.com/khundman/telemanom
  6. https://github.com/signals-dev/Orion
  7. https://github.com/NetManAIOps/OmniAnomaly
  8. https://github.com/unit8co/darts
  9. https://github.com/tinkoff-ai/etna-ts
  10. https://github.com/yzhao062/pyod
  11. https://www.radiativetransfer.org/misc/typhon/doc/modules.html#datasets How include dataset
Merlion Alibi Detect Kats pyod GluonTS RRCF STUMPY Greykite Prophet pmdarima deepad TSAD
Forecasting (Прогнозирование)
Anomaly Detection (Поиск аномалий)
Metrics (Алгоритмы оценки)
Ensembles (Ансамбли)
Benchmarking (Бенчмарки и датасеты)
Visualization (Визуализация результатов)
Data preprocessing (Предварительная обработка данных)
Automated EDA (Автоматизированный разведочный анализ данных)

Dependencies

  • python==3.7.6
  • numpy>=1.20.0
  • pandas>=1.0.1
  • matplotlib>=3.1.3
  • scikit-learn>=0.24.1
  • torch==1.5.0

Repo structure

  └── repo 
    ├───docs       # documentation
    ├───examples   # examples
    ├───tsad       # files of library

About

Package for Time Series Forecasting and Anomaly Detection Problems.

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Python 100.0%