timoklein / outlier_detection

Class based Python implementations of outlier detection algorithms.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Outlier detection algorithms

This repo contains class based implementations of three outlier detection algorithms.
Implementations were used for a university project.
O-Means outlier detection example for different sensitivities z

Outlier detection Using Clustering Methods [2004, Loureiro et. al]

Uses agglomerative clustering to detect outliers. The hierarchy is cut at a specified number of clusters. All clusters containing less samples than a threshold are considered outliers.
The original paper can be found here.

k-means-- [2013, Chawla & Gionies]

Uses a modified k-means algorithm to perform robust centroid updates with respect to outliers. The points with the largest point to centroid distances are considered outliers.
The original paper can be found here.

o-medians

Outlier detection algorithm based on k-means# [2017, Olukanmi & Twala]. Uses a k-medians based robust hierarchical initialization [2007, Arai & Barakbah]. Performs k-medians and detects the points which are more than z standard deviations distant from their centroid as outliers. Additionally clusters which are both distant from all other clusters and contain few observations are considered as clusters of outliers.

Getting Started

Experiments.py contains a number of sample experiments on toy datasets.

Prerequisites

Python 3.x

  • numpy
  • pandas
  • SciPy
  • scikit-learn
  • matplotlib

It is recommended to install the requirements through the Anaconda Python distribution. IMPORTANT: scikit-learn version needs to be >0.20.0, else the function pairwise_distances_argmin_min is bugged.

Authors

  • Timo Klein -

Acknowledgments

  • Oscar for being a cool cat and lending his name to the project

About

Class based Python implementations of outlier detection algorithms.


Languages

Language:Scilab 88.8%Language:Python 11.2%