tab_model standardization by 2 sd does not seem to work

Question

tab_model standardization by 2 sd does not seem to work

apolussa opened this issue 2 years ago · comments

The section 'standardized estimates' describes the ability to use Gelman's (2008) which rescales and divides by 2 sds for continuous vars and simply rescales for binary vars.

I seem to have trouble making the output of tab_model(lm, show.std = "std2") align with the results from manually standardizing, Gelman's standardize() function in the arm package, and jtools:summ() function which also allows standardization, with the standardized outputs from tab_model().

I could be not specifying enough or missing something, but here is a quick and dirty example (by no means the best place to do the standardization).

# basic regression:
lm_1 <- lm(mpg ~ cyl*hp+wt, data = mtcars)

# unstandardized
summary(lm_1)

# libary(arm)
# standardized using arm package
summary(arm::standardize(lm_1))

# library(jtools)
# standardized using jtools:
jtools:summ(lm_1, scale = T, n.sd = 2)

# manual standardization 
mtcars$cylSTD <- ((mtcars$cyl)-(mean(mtcars$cyl,na.rm=T)))/
  (2*sd(mtcars$cyl,na.rm=T))
mtcars$hpSTD <- ((mtcars$hp)-(mean(mtcars$hp,na.rm=T)))/
  (2*sd(mtcars$hp,na.rm=T))
mtcars$wtSTD <- ((mtcars$wt)-(mean(mtcars$wt,na.rm=T)))/
  (2*sd(mtcars$wt,na.rm=T))

summary(lm(mpg ~ cylSTD*hpSTD+wtSTD, data = mtcars))

# library(sjPlot)
# standardized using sjplot
sjPlot::tab_model(lm_1, show.std = "std2")

Output from summary(arm::standardize(lm_1)):

Output from sjPlot::tab_model(lm_1, show.std = "std2"):

Thanks,
Alex

apolussa · Answer 1 · Fri Jan 27 2023 08:31:26 GMT+0800 (China Standard Time)

Do other folks have this issue? It seems there is work that uses this package for reporting standardized coefficients...

Daniel · Answer 2 · Fri Mar 31 2023 14:08:13 GMT+0800 (China Standard Time)

Thanks for the feedback. The reason is that there's no unique approach towards standardization. E.g., you could standardize the complete data, which includes the response. This is done here. If you exclude the response from standardization, your results are replicated. See following examples, and the vignette on standardization (the easystats packages like datawizard or parameters are used internally, that's why they appear here):

model <- lm(mpg ~ cyl * hp + wt, data = mtcars)
summary(datawizard::standardize(model, method = "refit"))
#> 
#> Call:
#> lm(formula = mpg ~ cyl * hp + wt, data = data_std)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -0.5548 -0.2347 -0.1023  0.2018  0.7104 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  -0.3105     0.1272  -2.441  0.02146 *  
#> cyl           0.0113     0.1778   0.064  0.94978    
#> hp           -0.5269     0.1651  -3.191  0.00358 ** 
#> wt           -0.5065     0.1074  -4.718 6.51e-05 ***
#> cyl:hp        0.3851     0.1350   2.852  0.00823 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.372 on 27 degrees of freedom
#> Multiple R-squared:  0.8795, Adjusted R-squared:  0.8616 
#> F-statistic: 49.25 on 4 and 27 DF,  p-value: 5.065e-12
summary(datawizard::standardize(model, method = "refit", two_sd = TRUE))
#> 
#> Call:
#> lm(formula = mpg ~ cyl * hp + wt, data = data_std)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -0.5548 -0.2347 -0.1023  0.2018  0.7104 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  -0.3105     0.1272  -2.441  0.02146 *  
#> cyl           0.0226     0.3555   0.064  0.94978    
#> hp           -1.0538     0.3303  -3.191  0.00358 ** 
#> wt           -1.0130     0.2147  -4.718 6.51e-05 ***
#> cyl:hp        1.5403     0.5400   2.852  0.00823 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.372 on 27 degrees of freedom
#> Multiple R-squared:  0.8795, Adjusted R-squared:  0.8616 
#> F-statistic: 49.25 on 4 and 27 DF,  p-value: 5.065e-12
summary(datawizard::standardize(model, method = "refit", two_sd = TRUE, include_response = FALSE))
#> 
#> Call:
#> lm(formula = mpg ~ cyl * hp + wt, data = data_std)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.3440 -1.4144 -0.6166  1.2160  4.2815 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  18.2190     0.7666  23.766  < 2e-16 ***
#> cyl           0.1362     2.1427   0.064  0.94978    
#> hp           -6.3515     1.9905  -3.191  0.00358 ** 
#> wt           -6.1052     1.2942  -4.718 6.51e-05 ***
#> cyl:hp        9.2833     3.2548   2.852  0.00823 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.242 on 27 degrees of freedom
#> Multiple R-squared:  0.8795, Adjusted R-squared:  0.8616 
#> F-statistic: 49.25 on 4 and 27 DF,  p-value: 5.065e-12

^{Created on 2023-03-31 with reprex v2.0.2}

https://easystats.github.io/parameters/articles/standardize_parameters_effsize.html

Daniel · Answer 3 · Fri Mar 31 2023 14:28:12 GMT+0800 (China Standard Time)

I added a std.response argument, so you can now do:

tab_model(model, show.std = "std2", std.response = TRUE)
tab_model(model, show.std = "std2", std.response = FALSE)