A Julia package for Support Vector Data Description.
This package implements one-class classifiers and based on support vector data description. The package has been developed as part of a benchmark suite for active-learning strategies for one-class classification. For more information about this research project, see the OCAL project website, and the companion paper.
Holger Trittenbach, Adrian Englhardt, Klemens Böhm, "An Overview and a Benchmark of Active Learning for One-Class Classification" arXiv:1808.04759, 14 Aug 2018
This package works with Julia 1.0. This package is not registered yet. Please use the following command to add the package with Pkg3.
using Pkg
Pkg.add("https://github.com/englhardt/SVDD.jl.git")
The results presented in the paper base on a previous version of the package and on Julia 0.6.
To reproduce the experiment results from the paper, use the old package manager (with Pkg.clone) and checkout SVDD.jl at tag v1.0
.
One-class classifiers learn to identify if objects belong to a specific class, often used for outlier detection.
The package implements several one-class classifiers, and strategies to initialize parameters.
The jupyter notebook in examples
gives a jump start on how to use this package.
Currently, the classifiers have been implemented as optimization problems based on JuMP. The package includes:
- Vanilla Support Vector Data Description (VanillaSVDD) [1]
- SVDD with negative examples (SVDDNeg) [1]
- Semi-supervised Anomaly Detection (SSAD) [2]
There are two types of parameters to estimate for the classifiers: cost parameters and a kernel function. The packages includes the following strategies to initialize parameters.
- Gauss Kernel gamma
- Rule of Scott [3]
- Rule of Silverman [4]
- Fixed Gamma
- Cost parameters C
- Rule of Tax [1]
- Binary Search
- Fixed C
- Classification scores: The classifiers return scores by the following convention:
- score > 0 for outliers
- score <= 0 for inliers
- Data Format: The data is expected to be in column major order, i.e., first array dimension is the attribute, second is the observation.
[1 2 3 4; 5 6 7 8]
is a 2x4 Array with 2 attributes and 4 observations
We welcome contributions and bug reports.
This package is developed and maintained by Holger Trittenbach and Adrian Englhardt.
[1] Tax, David MJ, and Robert PW Duin. "Support vector data description." Machine learning 54.1 (2004): 45-66.
[2] Görnitz, Nico, et al. "Toward supervised anomaly detection." Journal of Artificial Intelligence Research 46 (2013): 235-262.
[3] Scott, David W. Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons, 2015.
[4] Silverman, Bernard W. Density estimation for statistics and data analysis. Routledge, 2018.