business-science / anomalize

Tidy anomaly detection

Home Page:https://business-science.github.io/anomalize/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Anomalies detected within bounds

jh9690 opened this issue · comments

I'm running anomalize on large datasets and occasionally come across instances where the anomalize() function finds outliers when the remainder is within the remainder_l1 and remainder_l2 bounds. Theoretically this should not be possible, but unless I'm interpreting the output incorrectly I can't understand this result. In the code below, gesd identifies rows 12 and 16 as anomalies, despite the remainder being greater than the lower bound.

library(tibbletime)
library(anomalize)

#Create data frame
df <- data.frame(date = c("2003-01-01", "2004-01-01", "2005-01-01", "2006-01-01", "2007-01-01", "2008-01-01", "2009-01-01", "2010-01-01",
"2011-01-01", "2012-01-01", "2013-01-01", "2014-01-01", "2015-01-01", "2016-01-01", "2017-01-01", "2018-01-01"),
val = c(13.54941, 13.57737, 13.61070, 13.62143, 13.64319, 13.64563, 13.66624, 13.68140, 13.69086, 13.70454,
13.70949, 13.73307, 13.77554, 13.81119, 13.83046, 13.83948))

df$date <- as.Date(df$date)

#Convert to tibbletime object
df_tbl <- as_tbl_time(df, index = date)

#Run anomalize
results <- df_tbl %>% time_decompose(val, frequency = "auto", trend = "auto", method = "stl") %>%
anomalize(remainder, method = "gesd", alpha = 0.05, max_anoms = 0.2) %>%
time_recompose()