awqcs / HIF

Hybrid Isolation Forest

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HIF

Hybrid Isolation Forest

The Hybrid Isolation Forest (HIF) is an extension of the [Isolation Forest (IF) algorithm] (http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html). IF and HIF are designed for detecting anomalies and outliers from a data point distribution. As is, they are alternative methods to the one-class Support Vector Machine.

HIF integrates two extensions dedicated to

  • overcome a drawback in the Isolation Forest (IF) algorithm that limits its use in the scope of anomaly detection
  • provide it with some supervised learning capability from few samples

The HIF is described (among other places) in this draft paper http://arxiv.org/abs/1705.03800. Please cite this draft paper if you use this code.

This is a simple package implementation for the HIF (inspired from this simple Python implementation of the Isolation Forest algorithm).

Installation

(It supports python3, posssibly python2)

$ sudo python3 setup.py install

Requirements

No extra requirements are needed.

Use

Launching the code

$ python3 -i testHIFDonuts.py

creating an instance of the donut dataset (normal data) and the anomaly clusters (red, green, cyan)

createDonutData(contamin=.005)

Creating the HIF

computeHIF(ntrees=512, sample_size=64)

Evaluating globally the HIF (AUC)

Outputs the best (HIF1) and <alpha1, alpha2> (HIF2) values

plotGlobalAucBis(contamin=True)

Evaluating globally the 1C-SVM (AUC)

testOneClassSVM(NU=.1, GAMMA=.1)

Evaluating globally the 2C-SVM (AUC)

testTwoClassSVM(C=.1, GAMMA=.1)

Evaluating cluster by cluster the IF, HIF(1,2), 1C-SVM, 2C-SVM (AUC)

plotDetailedResults(alpha0=.5, alpha1=.5, alpha2=.5)

About

Hybrid Isolation Forest

License:Other


Languages

Language:Python 100.0%