SigMahaKNN (signature_mahalanobis_knn
) combines the variance norm (a
generalisation of the Mahalanobis distance) with path signatures for anomaly
detection for multivariate streams. The signature_mahalanobis_knn
library is a
Python implementation of the SigMahaKNN method. The key contributions of this
library are:
- A simple and efficient implementation of the variance norm distance as
provided by the
signature_mahalanobis_knn.Mahalanobis
class. The class has two main methods:- The
fit
method to fit the variance norm distance to a training datase - The
distance
method to compute the distance between twonumpy
arraysx1
andx2
- The
- A simple and efficient implementation of the SigMahaKNN method as provided by
the
signature_mahalanobis_knn.SigMahaKNN
class. The class has two main methods:- The
fit
method to fit a model to a training dataset- The
fit
method can take in a corpus of streams as its input (where we will compute path signatures of using thesktime
library withesig
oriisignature
) or a corpus of path signatures as its input. This also opens up the possibility of using other feature represenations and applications of using the variance norm distance for anomaly detection - Currently, the library uses either
sklearn
'sNearestNeighbors
class orpynndescent
'sNNDescent
class to efficiently compute the nearest neighbour distances of a new data point to the corpus training data
- The
- The
conformance
method to compute the conformance score for a set of new data points- Similarly to the
fit
method, theconformance
method can take in a corpus of streams as its input (where we will compute path signatures of using thesktime
library withesig
oriisignature
) or a corpus of path signatures as its input
- Similarly to the
- The
The SigMahaKNN library is available on PyPI and can be installed with pip
:
pip install signature_mahalanobis_knn
As noted above, the signature_mahalanobis_knn
library has two main classes:
Mahalanobis
, a class for computing the variance norm distance, and
SigMahaKNN
, a class for computing the conformance score for a set of new data
points.
The core implementation of the SigMahaKNN method is in the
src/signature_mahalanobis_knn
folder:
mahal_distance.py
contains the implementation of theMahalanobis
class to compute the variance norm distancesig_maha_knn.py
contains the implementation of theSigMahaKNN
class to compute the conformance scores for a set of new data points against a corpus of training datautils.py
contains some utility functions that are useful for the librarybaselines/
is a folder containing some of the baseline methods we look at in the paper - see paper-examples/README.md for more details
There are various examples in the examples
and paper-examples
folder:
examples
contains small examples using randomly generated data for illustration purposespaper-examples
contains the examples used in the paper (link available soon!) where we compare the SigMahaKNN method to other baseline approaches (e.g. Isolation Forest and Local Outlier Factor) on real-world datasets- There are notebooks for downloading and preprocessing the datasets for the examples - see paper-examples/README.md for more details
To take advantage of pre-commit
, which will automatically format your code and
run some basic checks before you commit:
pip install pre-commit # or brew install pre-commit on macOS
pre-commit install # will install a pre-commit hook into the git repo
After doing this, each time you commit, some linters will be applied to format
the codebase. You can also/alternatively run pre-commit run --all-files
to run
the checks.
See CONTRIBUTING.md for more information on running the test
suite using nox
.