Bug for Model Selection and pseudo R squared for n_features = 1
pkopper opened this issue · comments
This bug refers to the code lines 52 and 116 in lime.R:
r2 <- fit$deviance / fit$null.deviance
This causes (pseudo) R squared to be wrong. The correct formula would be:
r2 <- 1 - fit$deviance / fit$null.deviance
This bug is limited only to cases where n_features = 1
because if n_features > 1
a different model (glmnet) is used where (pseudo) R squared is automatically extracted and not computed within the function itself.
However, for n_features = 1
the consequences are severe: model selection always results in selecting the worst performing model and the final output indicates the inverse pseudo R squared, typically very close to 1.
This snippet may illustrate that the code resulting in the bug is inconsistent with the implementation in glmnet, too:
library(glmnet)
data(QuickStartExample)
fit <- glmnet(x, y)
1 - deviance(fit) / fit$nulldev
fit$dev.ratio
Within the next days, I will create a pull request dealing with this issue.