Bug for Model Selection and pseudo R squared for n_features = 1

Question

Bug for Model Selection and pseudo R squared for n_features = 1

pkopper opened this issue 5 years ago · comments

This bug refers to the code lines 52 and 116 in lime.R:
r2 <- fit$deviance / fit$null.deviance

This causes (pseudo) R squared to be wrong. The correct formula would be:
r2 <- 1 - fit$deviance / fit$null.deviance

This bug is limited only to cases where n_features = 1 because if n_features > 1 a different model (glmnet) is used where (pseudo) R squared is automatically extracted and not computed within the function itself.

However, for n_features = 1 the consequences are severe: model selection always results in selecting the worst performing model and the final output indicates the inverse pseudo R squared, typically very close to 1.

This snippet may illustrate that the code resulting in the bug is inconsistent with the implementation in glmnet, too:

library(glmnet)
data(QuickStartExample)
fit <- glmnet(x, y)
1 - deviance(fit) / fit$nulldev
fit$dev.ratio

Within the next days, I will create a pull request dealing with this issue.