Wrong AIC values with tab_model
lfpdroubi opened this issue · comments
When comparing models with tab_model()
, AIC of second fit (844.889) seems to be wrong.
Reproducible example:
m <- c(5000, 360)
s <- c(2000, 50)
r <- -0.75
sigma <- sqrt(log(s^2/m^2 + 1))
mu <- log(m) - sigma^2/2
rho <- log(r*prod(s)/prod(m) + 1)
# Random data generation
library(MASS)
n <- 50
set.seed(2)
dados <- exp(mvrnorm(n = n, mu = mu, Sigma = diag(sigma^2 - rho) + rho,
empirical = TRUE))
colnames(dados) <- c("PU", "Area")
dados <- as.data.frame(dados)
# Wrong fit:
wfit <- lm(PU ~ Area, data = dados)
# Good fit:
fit <- lm(log(PU)~log(Area), data = dados)
library(sjPlot)
tab_model(wfit, fit, show.aic = T) # AIC of second (good) fit seems to be wrong!
AIC(fit) # 0.59
tab_model()
relies on performance::performance_aic()
to get the AIC. That function returns a corrected AIC for transformed response-values, which is more accurate, since the underlying "variation" in the data should be similar if the raw data is the same. See example and links to docs:
m <- c(5000, 360)
s <- c(2000, 50)
r <- -0.75
sigma <- sqrt(log(s^2/m^2 + 1))
mu <- log(m) - sigma^2/2
rho <- log(r*prod(s)/prod(m) + 1)
# Random data generation
library(MASS)
n <- 50
set.seed(2)
dados <- exp(mvrnorm(n = n, mu = mu, Sigma = diag(sigma^2 - rho) + rho,
empirical = TRUE))
colnames(dados) <- c("PU", "Area")
dados <- as.data.frame(dados)
# Wrong fit:
wfit <- lm(PU ~ Area, data = dados)
# Good fit:
fit <- lm(log(PU)~log(Area), data = dados)
# comparable results
performance::performance_aic(wfit)
#> [1] 863.5921
performance::performance_aic(fit)
#> [1] 844.8888
see ?performance::performance_aic
:
performance_aic()
correctly detects transformed response and, unlikestats::AIC()
, returns the "corrected" AIC value on the original scale. To get back to the original scale, the likelihood of the model is multiplied by the Jacobian/derivative of the transformation.
See also https://easystats.github.io/performance/reference/performance_aicc.html and https://easystats.github.io/insight/reference/get_loglikelihood.html (argument check_response
).
Perfect, @strengejacke! Although I think you should use different argument names (just suggesting), like show_adj_aic
and show_aic
. Then it would be clear to the user of your package what's happening behind the courtains. Thanks a lot!