strengejacke / sjPlot

sjPlot - Data Visualization for Statistics in Social Science

Home Page:https://strengejacke.github.io/sjPlot

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong AIC values with tab_model

lfpdroubi opened this issue · comments

When comparing models with tab_model(), AIC of second fit (844.889) seems to be wrong.

Reproducible example:

m <- c(5000, 360)
s <- c(2000, 50)
r <- -0.75
sigma <- sqrt(log(s^2/m^2 + 1))
mu <- log(m) - sigma^2/2
rho <- log(r*prod(s)/prod(m) + 1)

# Random data generation
library(MASS)
n <- 50
set.seed(2)
dados <- exp(mvrnorm(n = n, mu = mu, Sigma = diag(sigma^2 - rho) + rho,
            empirical = TRUE))
colnames(dados) <- c("PU", "Area")
dados <- as.data.frame(dados)

# Wrong fit:
wfit <- lm(PU ~ Area, data = dados)

# Good fit:
fit <- lm(log(PU)~log(Area), data = dados)

library(sjPlot)

tab_model(wfit, fit, show.aic = T) # AIC of  second (good) fit seems to be wrong!

AIC(fit) # 0.59

tab_model() relies on performance::performance_aic() to get the AIC. That function returns a corrected AIC for transformed response-values, which is more accurate, since the underlying "variation" in the data should be similar if the raw data is the same. See example and links to docs:

m <- c(5000, 360)
s <- c(2000, 50)
r <- -0.75
sigma <- sqrt(log(s^2/m^2 + 1))
mu <- log(m) - sigma^2/2
rho <- log(r*prod(s)/prod(m) + 1)

# Random data generation
library(MASS)
n <- 50
set.seed(2)
dados <- exp(mvrnorm(n = n, mu = mu, Sigma = diag(sigma^2 - rho) + rho,
            empirical = TRUE))
colnames(dados) <- c("PU", "Area")
dados <- as.data.frame(dados)

# Wrong fit:
wfit <- lm(PU ~ Area, data = dados)

# Good fit:
fit <- lm(log(PU)~log(Area), data = dados)

# comparable results
performance::performance_aic(wfit)
#> [1] 863.5921
performance::performance_aic(fit)
#> [1] 844.8888

see ?performance::performance_aic:

performance_aic() correctly detects transformed response and, unlike stats::AIC(), returns the "corrected" AIC value on the original scale. To get back to the original scale, the likelihood of the model is multiplied by the Jacobian/derivative of the transformation.

See also https://easystats.github.io/performance/reference/performance_aicc.html and https://easystats.github.io/insight/reference/get_loglikelihood.html (argument check_response).

Perfect, @strengejacke! Although I think you should use different argument names (just suggesting), like show_adj_aic and show_aic. Then it would be clear to the user of your package what's happening behind the courtains. Thanks a lot!