mohcinemadkour / TagAnomaly

Anomaly detection analysis and labeling tool, specifically for multiple time series (one time series per category)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

taganomaly

Anomaly detection labeling tool, specifically for multiple time series (one time series per category).

Note:

This tool was built as a part of an engagement, and is not maintained on a regular basis.

Taganomaly is a tool for creating labeled data for anomaly detection models. It allows the labeler to select points on a time series, further inspect them by looking at the behavior of other times series at the same time range, or by looking at the raw data that created this time series (assuming that the time series is an aggregated metric, counting events per time range)

Click here to deploy on Azure using Azure Container Instances:

Deploy to Azure

The app has four main windows:

1. The labeling window

UI

Time series labeling

Time series

Selected points table view

Selected points

View raw data for window (if exists)

Detailed data

2. Compare this category with others over time

Compare

3. Find proposed anomalies using the Twitter AnomalyDetection package

Reference results

4. Observe the changes in distribution between categories

This could be useful to understand whether an anomaly was univariate or multivariate Distribution comparison

How to run locally:

This tool uses the shiny framework for visualizing events. In order to run it, you need to have R and preferably Rstudio. Once you have everything installed, open the project on R studio and click Run App, or call runApp() from the console. You might need to manually install the required packages

Requirements

  • R (3.4.0 or above)

Used packages:

  • shiny
  • dplyr
  • gridExtra
  • shinydashboard
  • DT
  • ggplot2
  • shinythemes
  • AnomalyDetection

How to deploy using docker:

Option 1: Deploy to Azure Web App for Containers or Azure Container Instances. More details here (webapp) and here (container instances)

Option 2: Deploy this image to your own environment.

Dockerize the shiny app:

Follow the steps on rize on how to deploy on shiny-server. Default port is 3838, so make sure you have it open or change the default port to something else.

Instructions of use

  1. Import time series CSV file. Assumed structure:
  • date ("%Y-%m-%d %H:%M:%S"). TagAnomaly will attempt to infer the date from other patterns as well, using the parsedate package
  • category (optional)
  • value
  1. (Optional) Import raw data time series CSV file.

If the original time series is an aggreation over time windows, this time series is the raw values themselves. This way we could dive deeper into an anomalous value and see what it is comprised of. Assumed structure:

  • date ("%Y-%m-%d %H:%M:%S"). TagAnomaly will attempt to infer the date from other patterns as well, using the parsedate package
  • category (optional)
  • content
  1. Select category (optional, if exists)

  2. Select time range on slider

  3. Select points on plot that look anomalous. Optional (1): click on one time range on the table below the plot to see raw data on this time range Optional (2): Open the All Categories tab to see how other time series behave on the same time range.

  4. Once you decide that these are actual anomalies, save the resulting table to csv by clicking on Download labels set and continue to the next category.

Current limitations/issues

It is currently impossible to have multiple selections on one plot. A workaround is to select one area, download the csv and select the next area. Each downloaded CSV has a random string so files won't override each other. Once labeling is finished, one option is to run the provided prep_labels.py file in order to concatenate all of TagAnomaly's output file to one CSV.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

About

Anomaly detection analysis and labeling tool, specifically for multiple time series (one time series per category)

License:MIT License


Languages

Language:R 96.2%Language:Python 3.8%