yzhao062 / pyod

A Comprehensive and Scalable Python Library for Outlier Detection (Anomaly Detection)

Home Page:http://pyod.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Which algorithms of PyOD are "robust"?

asmaier opened this issue · comments

In many cases, training with anomalies (outliers) in the (unlabeled) training data might lead to learning wrong detection models. For these cases so called robust algorithms have been developed. But I couldn't find documentation about which algorithms of PyOD are robust. For example the PyOD implementation of PCA seems to be using the sklearns PCA, which is not a robust PCA as described at https://en.wikipedia.org/wiki/Robust_principal_component_analysis or https://en.wikipedia.org/wiki/L1-norm_principal_component_analysis. So which algorithms in PyOD are really robust ?

robustness is a relative term. I would recommend isolation forest as an ensemble methods. good performance and relatively good robustness.

I think there is a misunderstanding. Robustness in statistics is not a relative term. There is a whole field called robust statistics.

Robust statistics seek to provide methods that emulate popular statistical methods, but are not unduly affected by outliers or other small departures from model assumptions. (https://en.wikipedia.org/wiki/Robust_statistics)

But I agree the term robust can have a different meanings for people not familiar with that field, so let me reformulate my question:

Which algorithms of PyOD are not unduly affected by outliers?