Replace current outlier test by the Bonferroni Outlier Test
schnorr opened this issue · comments
As of today, the outlier detection mechanism is nonexistent for sparse linear algebra (qrmumps) and based on the inter quantile range (IQR) for the dense linear algebra (cholesky). IQR is weak because there is no performance model to anticipate expected behavior, but we do have fair cholesky and qrmumps perf. models, enabling us to use the Bonferroni Outlier Test (available in the the car
package with the outlierTest
function). Here's a code snippet to classify tasks as outliers once the outlierTest
function has been called with a model:
out <- outlierTest(fit, n.max=Inf)
out.tibble <- tibble(Order = out$bonf.p %>% names,
Bonferonni = out$bonf.p) %>%
filter(Bonferonni < 0.5)
df %>%
mutate(Order = 1:n()) %>%
mutate(Outlier = case_when(Order %in% out.tibble$Order ~ TRUE,
TRUE ~ FALSE)) %>%
select(-Order)
Where fit
contains the model. Note that the order of observations given to the model is important, since the outlierTest
reports outliers based on their indexes. So we need to create that order again with the original df
observations and then use the set of observations detected as outliers by Bonferroni. Scalability of this approach is yet to be evaluated.