Marcnuth / AnomalyDetection

Twitter's Anomaly Detection in Pure Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

median absolute deviation or mean absolute deviation

ColaBH opened this issue · comments

The primary algorithm uses median absolute deviation to replace standard deviation, to make it more robust against anomaly points.

But in this code, pandas.mad() is used. However, pandas.mad() is mean absolute deviation, not median absolute deviation. Both can work, but median absolute deviation is better, in my opinion.

@ColaBH Interesting, have you tested both versions to see which is better? @Marcnuth depending upon how testing goes, should this be configurable (median/mean absolute deviation)?

I think which is better or not may depend on what data look like.
In my data, there is no big difference because of my time series data didn't have really big or small value. So the difference between median absolute deviation and mean absolute deviation is not huge.
But if your data may have really big or small value, I think the median absolute deviation is more robust.

And if you want to try to use median absolute deviation, you can try the following modification.
The original version:
https://github.com/Marcnuth/AnomalyDetection/blob/master/anomaly_detection/anomaly_detect_ts.py#L560

ares = ares / data.mad()

And I change to use median absolute deviation:

from statsmodels import robust
ares = ares / robust.mad(data.dropna())

Hope it helps.