yzhao062 / pyod

A Comprehensive and Scalable Python Library for Outlier Detection (Anomaly Detection)

Home Page:http://pyod.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can PyOD be used for anomaly class classification?

DylTop opened this issue · comments

commented

Thank you for your willingness to share the library of outlier detection. I found this library to be suitable for outlier detection by reading papers and APIs related to the library. For anomaly detection, in addition to outlier detection, there is also anomaly category detection. But I wonder if our library can classify and identify one-dimensional time series data of anomalous classes? Or do you have any good methods?

I'm not exactly sure whether I get your request exactly. Could explain what you mean by

But I wonder if our library can classify and identify one-dimensional time series data of anomalous classes?

For time series outlier detection there is a library named TODS, you could also think about using Changepoynt, where I recommend using the SST algorithm (Disclaimer: This is done by me). These are concentrated on finding anomalous values in a given time series.

Another Library for Motif and Novelty detection is stumpy using the matrix profile. A good similarity measure for time series shape comparisons is also the Dynamic Time Warping Distance.

For anomaly detection, in addition to outlier detection, there is also anomaly category detection.

Outlier category detection would be classification, which is not the scope and focus of this library. I would recommend looking for classification approaches there.

If you mean: Your samples are time series and you want to find differences/outliers in shape, one could view this as a quite high-dimensional outlier detection problem, which could be done using some algorithms in this package. Each feature of your sample would then correspond to a sample of your time series. I would be very careful here, as dimensions might explode very quickly and you will most certainly have problems with translated (phase shifted) series. I would also be careful with clustering for outlier detection, as it is quite difficult to get correct "means" for time series (again because of translation/phase shift for example). DTW based KNN-Classification might be a good approach to detect anomaly classes. Somebody has also answered that similar here.

I can also recommend looking at the papers written together with Eamonn Keogh. I'm pretty sure he has done something with DTW clustering, but I could not find it right away.

commented

Ok, thank you very much for your suggestion