jmbhughes / crcf

Combination Robust Cut Forests: Merging Isolation Forests and Robust Random Cut Forests

Home Page:https://jmbhughes.github.io/crcf/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Combination Robust Cut Forests

CodeFactor PyPI version codecov

Isolation Forests [Liu+2008] and Robust Random Cut Trees [Guha+2016] are very similar in many ways, as outlined in the supporting overview. Most notably, they are extremes of the same outlier scoring function:

$$\theta \textrm{Depth} + (1 - \theta) \textrm{[Co]Disp}$$

The combination robust cut forest allows you to combine both scores by using an theta other than 0 or 1.

Install

You can install with through pip install crcf. Alternatively, you can download the repository and run python3 setup.py install or pip3 install . Please note that this package uses features from Python 3.7+ and is not compatible with earlier Python versions.

Tasks

  • complete basic implementation
  • provide clear documentation and usage instructions
  • ensure interface allows for fitting and scoring on multiple points at the same time
  • implement a better saving method than pickling
  • use random tests with hypothesis
  • implement tree down in cython
  • accelerate forests with multi-threading
  • incorporate categorical variable support, including categorical rules
  • complete the write-up document with a benchmarking of performance

References

About

Combination Robust Cut Forests: Merging Isolation Forests and Robust Random Cut Forests

https://jmbhughes.github.io/crcf/

License:MIT License


Languages

Language:Python 100.0%