LimaoC / amodely

Anomaly Detection Project for Auto & General

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

amodely

An anomaly detection dashboard for time-series data

Table of Contents
  1. About the Project
  2. Getting Started
  3. Usage
  4. Documentation

About the Project

Amodely is an anomaly detection dashboard that I built during my time as a Pricing Intern at Auto & General. It is used to identify anomalies in time-series data using (primarily) a Seasonal-Trend decomposition with LOESS (STL) algorithm.

Tech Stack

The teck stack consists wholy of Python and various Python frameworks;

Getting Started

Dependencies

The required dependencies can be found under /requirements.txt.

Installation

To install and set up the dashboard, open up Windows PowerShell or Git Bash and follow the steps below:

  1. Clone the repo and enter the directory

    git clone https://github.com/LimaoC/amodely.git
    
    cd amodely
    
  2. Create a virtual environment

    python -m virtualenv venv
    
  3. Enter the virtual environment

    Windows PowerShell:

    venv/Scripts/activate
    

    Git Bash:

    source venv/Scripts/activate
    
  4. Install the required dependencies

    pip install -r requirements.txt
    
  5. Create a .env file in the root directory with the following variable pointing to the path of the dataset:

    DATASET_PATH="C:/Path/To/Dataset/"
    
  6. Change the DATASET_NAME variable in /src/lib/lib.py (the default is dataset.xlsx):

    DATASET_NAME = "dataset.xlsx"
  7. Run the dashboard on localhost

    Dashboard:

    python -m src.dash-app.app
    

    Anomaly detection model (for debugging purposes):

    python -i -m src.amodely
    

Usage

Examples of how the dashboard can be used (note that the data below was randomly generated):

General Plot.ly features

  • Hover over data points to see info (date, category, conversion rate)
  • Adjust graph axes dynamically
  • Zoom in on a particular region
  • Download plot as a png

ex1

Master dashboard

  • Configurations:
    • Graphing Conversion Rate vs. Quote Date
    • Categorising data by Dimension 1
    • All categories displayed (CATEGORY_1A, CATEGORY_1B, ...)
    • Filtering for Dimension 2 data that are either in the category CATEGORY_2A or CATEGORY_2B
    • Removing categories that have less than 100 entries
    • Filtering for 2020 data

ex2 ex3

Anomaly detection dashboard

  • Image 1 configurations:
    • Graphing Quote Volume vs. Quote Date
    • Categorising data by All (combining all dimensions)
    • No filter applied
    • Categories with less than 100 entries removed automatically to avoid interfering with the anomaly detection algorithm
    • Anomaly detection algorithm running at a confidence interval of 95% (default)
    • Filtering for all data (2020 - 2021)
    • Hovering over Sep 06, 2021 week to inspect daily data points from that week
  • Image 2 configurations:
    • Graphing Conversion Rate vs. Quote Date
    • Categorising data by Dimension 1
    • Isolating second and third category in Dimension 1
    • No filter applied
    • Categories with less than 100 entries removed automatically to avoid interfering with the anomaly detection algorithm
    • Anomaly detection algorithm running at a confidence interval of 80% (smaller threshold for standard deviations, more outliers)
    • Filtering for all data (2020 - 2021)

ex4

Anomaly detection output table

  • Table of anomalies based on the current configurations of the anomaly detection dashboard
  • Updates dynamically when settings/configurations are changed
  • Export as CSV

Documentation

The documentation can be viewed here.

About

Anomaly Detection Project for Auto & General

License:GNU General Public License v3.0


Languages

Language:Python 97.3%Language:CSS 2.7%