This is my first attempt at writing an R package. During my time at the BC Cancer Agency, I researched a lot of summary statistics packages online. Many of these packages produced beautiful tables but were not very flexible when it came to handling NA
values for example. For this package, I wrote code which produces summary statistics tables that represents the data best in my opinion. The tblgoat
package can produce overall summary statistics tables and can also produce tables by groups.
One thing to note for this package is that it transforms factors
into character
variables. Hence, if there are certain categories in the data set that have no values, they won't be represented in the table as count 0 (0%). They will just be left out. Something to be aware of.
Also, this package can return tibbles
if you specify kable = FALSE
in the tbl_goat
function. This is made available if you want to further work with a data frame rather than just getting the markdown format with kable()
.
If you want to use more flexible and very sophisticated summary statistics packages, I would recommend using the arsenal package or the gtsummary package. Both are available on CRAN.
# install.packages("devtools")
devtools::install_github("Pascal-Schmidt/tblGoat")
library(tidyverse)
library(tblgoat)
mtcars %>%
dplyr::mutate_at(vars("cyl", "am", "vs", "gear", "carb"),
.funs = ~ as.factor(.)) %>%
dplyr::as_tibble() -> mtcars
tblgoat::tbl_goat(mtcars)
To create an overall summary statistics table, we just have to pass in a data frame into the tbl_goat
function.
Characteristic | Overall |
---|---|
disp | |
Mean (sd) | 230.72 (123.94) |
Median (Q1 - Q3) | 196.3 (120.83 - 326) |
Range | 71.1 - 472 |
drat | |
Mean (sd) | 3.6 (0.53) |
Median (Q1 - Q3) | 3.7 (3.08 - 3.92) |
Range | 2.76 - 4.93 |
hp | |
Mean (sd) | 146.69 (68.56) |
Median (Q1 - Q3) | 123 (96.5 - 180) |
Range | 52 - 335 |
mpg | |
Mean (sd) | 20.09 (6.03) |
Median (Q1 - Q3) | 19.2 (15.43 - 22.8) |
Range | 10.4 - 33.9 |
qsec | |
Mean (sd) | 17.85 (1.79) |
Median (Q1 - Q3) | 17.71 (16.89 - 18.9) |
Range | 14.5 - 22.9 |
wt | |
Mean (sd) | 3.22 (0.98) |
Median (Q1 - Q3) | 3.33 (2.58 - 3.61) |
Range | 1.51 - 5.42 |
am | |
0 | 19 (59.38%) |
1 | 13 (40.62%) |
carb | |
1 | 7 (21.88%) |
2 | 10 (31.25%) |
3 | 3 (9.38%) |
4 | 10 (31.25%) |
6 | 1 (3.12%) |
8 | 1 (3.12%) |
cyl | |
4 | 11 (34.38%) |
6 | 7 (21.88%) |
8 | 14 (43.75%) |
gear | |
3 | 15 (46.88%) |
4 | 12 (37.5%) |
5 | 5 (15.62%) |
vs | |
0 | 18 (56.25%) |
1 | 14 (43.75%) |
tblgoat::tbl_goat(mtcars, grouping_var = "am")
0 N = 19 (59.38%) | 1 N = 13 (40.62%) | Total N = 32 (100%) | p-values | |
---|---|---|---|---|
carb | 0.284 | |||
1 | 3 (15.79%) | 4 (30.77%) | 7 (21.88%) | |
2 | 6 (31.58%) | 4 (30.77%) | 10 (31.25%) | |
3 | 3 (15.79%) | No Data | 3 (9.38%) | |
4 | 7 (36.84%) | 3 (23.08%) | 10 (31.25%) | |
6 | No Data | 1 (7.69%) | 1 (3.12%) | |
8 | No Data | 1 (7.69%) | 1 (3.12%) | |
cyl | 0.013 | |||
4 | 3 (15.79%) | 8 (61.54%) | 11 (34.38%) | |
6 | 4 (21.05%) | 3 (23.08%) | 7 (21.88%) | |
8 | 12 (63.16%) | 2 (15.38%) | 14 (43.75%) | |
gear | < 0.001 | |||
3 | 15 (78.95%) | No Data | 15 (46.88%) | |
4 | 4 (21.05%) | 8 (61.54%) | 12 (37.5%) | |
5 | No Data | 5 (38.46%) | 5 (15.62%) | |
vs | 0.556 | |||
0 | 12 (63.16%) | 6 (46.15%) | 18 (56.25%) | |
1 | 7 (36.84%) | 7 (53.85%) | 14 (43.75%) | |
disp | 0.001 | |||
Mean (sd) | 290.38 (110.17) | 143.53 (87.2) | 230.72 (123.94) | |
Median (Q1 - Q3) | 275.8 (196.3 - 360) | 120.3 (79 - 160) | 196.3 (120.83 - 326) | |
Range | 120.1 - 472 | 71.1 - 351 | 71.1 - 472 | |
drat | < 0.001 | |||
Mean (sd) | 3.29 (0.39) | 4.05 (0.36) | 3.6 (0.53) | |
Median (Q1 - Q3) | 3.15 (3.07 - 3.7) | 4.08 (3.85 - 4.22) | 3.7 (3.08 - 3.92) | |
Range | 2.76 - 3.92 | 3.54 - 4.93 | 2.76 - 4.93 | |
hp | 0.044 | |||
Mean (sd) | 160.26 (53.91) | 126.85 (84.06) | 146.69 (68.56) | |
Median (Q1 - Q3) | 175 (116.5 - 192.5) | 109 (66 - 113) | 123 (96.5 - 180) | |
Range | 62 - 245 | 52 - 335 | 52 - 335 | |
mpg | 0.002 | |||
Mean (sd) | 17.15 (3.83) | 24.39 (6.17) | 20.09 (6.03) | |
Median (Q1 - Q3) | 17.3 (14.95 - 19.2) | 22.8 (21 - 30.4) | 19.2 (15.43 - 22.8) | |
Range | 10.4 - 24.4 | 15 - 33.9 | 10.4 - 33.9 | |
qsec | 0.258 | |||
Mean (sd) | 18.18 (1.75) | 17.36 (1.79) | 17.85 (1.79) | |
Median (Q1 - Q3) | 17.82 (17.18 - 19.17) | 17.02 (16.46 - 18.61) | 17.71 (16.89 - 18.9) | |
Range | 15.41 - 22.9 | 14.5 - 19.9 | 14.5 - 22.9 | |
wt | < 0.001 | |||
Mean (sd) | 3.77 (0.78) | 2.41 (0.62) | 3.22 (0.98) | |
Median (Q1 - Q3) | 3.52 (3.44 - 3.84) | 2.32 (1.94 - 2.78) | 3.33 (2.58 - 3.61) | |
Range | 2.46 - 5.42 | 1.51 - 3.57 | 1.51 - 5.42 |
If you want to startify the table by multiple columns, just add more variables in the groupig_var
vector.
tblgoat::tbl_goat(mtcars, grouping_var = c("am", "vs"))
0_1 N = 7 (21.88%) | 1_1 N = 7 (21.88%) | 0_0 N = 12 (37.5%) | 1_0 N = 6 (18.75%) | Total N = 32 (100%) | p-values | |
---|---|---|---|---|---|---|
carb | 0.028 | |||||
1 | 3 (42.86%) | 4 (57.14%) | No Data | No Data | 7 (21.88%) | |
2 | 2 (28.57%) | 3 (42.86%) | 4 (33.33%) | 1 (16.67%) | 10 (31.25%) | |
3 | No Data | No Data | 3 (25%) | No Data | 3 (9.38%) | |
4 | 2 (28.57%) | No Data | 5 (41.67%) | 3 (50%) | 10 (31.25%) | |
6 | No Data | No Data | No Data | 1 (16.67%) | 1 (3.12%) | |
8 | No Data | No Data | No Data | 1 (16.67%) | 1 (3.12%) | |
cyl | < 0.001 | |||||
4 | 3 (42.86%) | 7 (100%) | No Data | 1 (16.67%) | 11 (34.38%) | |
6 | 4 (57.14%) | No Data | No Data | 3 (50%) | 7 (21.88%) | |
8 | No Data | No Data | 12 (100%) | 2 (33.33%) | 14 (43.75%) | |
gear | < 0.001 | |||||
3 | 3 (42.86%) | No Data | 12 (100%) | No Data | 15 (46.88%) | |
4 | 4 (57.14%) | 6 (85.71%) | No Data | 2 (33.33%) | 12 (37.5%) | |
5 | No Data | 1 (14.29%) | No Data | 4 (66.67%) | 5 (15.62%) | |
disp | < 0.001 | |||||
Mean (sd) | 175.11 (49.13) | 89.8 (18.8) | 357.62 (71.82) | 206.22 (95.23) | 230.72 (123.94) | |
Median (Q1 - Q3) | 167.6 (143.75 - 196.3) | 79 (77.2 - 101.55) | 355 (296.95 - 410) | 160 (148.75 - 265.75) | 196.3 (120.83 - 326) | |
Range | 120.1 - 258 | 71.1 - 121 | 275.8 - 472 | 120.3 - 351 | 71.1 - 472 | |
drat | < 0.001 | |||||
Mean (sd) | 3.57 (0.46) | 4.15 (0.38) | 3.12 (0.23) | 3.94 (0.34) | 3.6 (0.53) | |
Median (Q1 - Q3) | 3.7 (3.38 - 3.92) | 4.08 (3.96 - 4.17) | 3.08 (3.05 - 3.17) | 3.9 (3.69 - 4.14) | 3.7 (3.08 - 3.92) | |
Range | 2.76 - 3.92 | 3.77 - 4.93 | 2.76 - 3.73 | 3.54 - 4.43 | 2.76 - 4.93 | |
hp | < 0.001 | |||||
Mean (sd) | 102.14 (20.93) | 80.57 (24.14) | 194.17 (33.36) | 180.83 (98.82) | 146.69 (68.56) | |
Median (Q1 - Q3) | 105 (96 - 116.5) | 66 (65.5 - 101) | 180 (175 - 218.75) | 142.5 (110 - 241.75) | 123 (96.5 - 180) | |
Range | 62 - 123 | 52 - 113 | 150 - 245 | 91 - 335 | 52 - 335 | |
mpg | < 0.001 | |||||
Mean (sd) | 20.74 (2.47) | 28.37 (4.76) | 15.05 (2.77) | 19.75 (4.01) | 20.09 (6.03) | |
Median (Q1 - Q3) | 21.4 (18.65 - 22.15) | 30.4 (25.05 - 31.4) | 15.2 (14.05 - 16.62) | 20.35 (16.78 - 21) | 19.2 (15.43 - 22.8) | |
Range | 17.8 - 24.4 | 21.4 - 33.9 | 10.4 - 19.2 | 15 - 26 | 10.4 - 33.9 | |
qsec | < 0.001 | |||||
Mean (sd) | 19.97 (1.46) | 18.7 (0.95) | 17.14 (0.8) | 15.8 (1.09) | 17.85 (1.79) | |
Median (Q1 - Q3) | 20 (19.17 - 20.12) | 18.61 (18.56 - 19.18) | 17.35 (16.98 - 17.66) | 15.98 (14.82 - 16.64) | 17.71 (16.89 - 18.9) | |
Range | 18.3 - 22.9 | 16.9 - 19.9 | 15.41 - 18 | 14.5 - 17.02 | 14.5 - 22.9 | |
wt | < 0.001 | |||||
Mean (sd) | 3.19 (0.35) | 2.03 (0.44) | 4.1 (0.77) | 2.86 (0.49) | 3.22 (0.98) | |
Median (Q1 - Q3) | 3.21 (3.17 - 3.44) | 1.94 (1.73 - 2.26) | 3.81 (3.56 - 4.37) | 2.82 (2.66 - 3.1) | 3.33 (2.58 - 3.61) | |
Range | 2.46 - 3.46 | 1.51 - 2.78 | 3.44 - 5.42 | 2.14 - 3.57 | 1.51 - 5.42 |
tblgoat::tbl_goat(mtcars, grouping_var = "am", p_value = F, header = F, total = F)
0 | 1 | |
---|---|---|
carb | ||
1 | 3 (15.79%) | 4 (30.77%) |
2 | 6 (31.58%) | 4 (30.77%) |
3 | 3 (15.79%) | No Data |
4 | 7 (36.84%) | 3 (23.08%) |
6 | No Data | 1 (7.69%) |
8 | No Data | 1 (7.69%) |
cyl | ||
4 | 3 (15.79%) | 8 (61.54%) |
6 | 4 (21.05%) | 3 (23.08%) |
8 | 12 (63.16%) | 2 (15.38%) |
gear | ||
3 | 15 (78.95%) | No Data |
4 | 4 (21.05%) | 8 (61.54%) |
5 | No Data | 5 (38.46%) |
vs | ||
0 | 12 (63.16%) | 6 (46.15%) |
1 | 7 (36.84%) | 7 (53.85%) |
disp | ||
Mean (sd) | 290.38 (110.17) | 143.53 (87.2) |
Median (Q1 - Q3) | 275.8 (196.3 - 360) | 120.3 (79 - 160) |
Range | 120.1 - 472 | 71.1 - 351 |
drat | ||
Mean (sd) | 3.29 (0.39) | 4.05 (0.36) |
Median (Q1 - Q3) | 3.15 (3.07 - 3.7) | 4.08 (3.85 - 4.22) |
Range | 2.76 - 3.92 | 3.54 - 4.93 |
hp | ||
Mean (sd) | 160.26 (53.91) | 126.85 (84.06) |
Median (Q1 - Q3) | 175 (116.5 - 192.5) | 109 (66 - 113) |
Range | 62 - 245 | 52 - 335 |
mpg | ||
Mean (sd) | 17.15 (3.83) | 24.39 (6.17) |
Median (Q1 - Q3) | 17.3 (14.95 - 19.2) | 22.8 (21 - 30.4) |
Range | 10.4 - 24.4 | 15 - 33.9 |
qsec | ||
Mean (sd) | 18.18 (1.75) | 17.36 (1.79) |
Median (Q1 - Q3) | 17.82 (17.18 - 19.17) | 17.02 (16.46 - 18.61) |
Range | 15.41 - 22.9 | 14.5 - 19.9 |
wt | ||
Mean (sd) | 3.77 (0.78) | 2.41 (0.62) |
Median (Q1 - Q3) | 3.52 (3.44 - 3.84) | 2.32 (1.94 - 2.78) |
Range | 2.46 - 5.42 | 1.51 - 3.57 |
median_gdp <- median(gapminder$gdpPercap)
gapminder %>%
select(-country) %>%
mutate(gdpPercap = ifelse(gdpPercap > median_gdp, "high", "low")) %>%
mutate(gdpPercap = factor(gdpPercap)) %>%
mutate(pop = pop / 1000000) -> gapminder
gapminder <- lapply(gapminder, function(x) x[sample(c(TRUE, NA),
prob = c(0.9, 0.1),
size = length(x),
replace = TRUE
)])
gapminder <- as_tibble(gapminder)
tblgoat::tbl_goat(gapminder, grouping_var = "continent")
Africa N = 567 (37.11%) | Americas N = 263 (17.21%) | Asia N = 357 (23.36%) | Europe N = 319 (20.88%) | Oceania N = 22 (1.44%) | Total N = 1528 (100%) | p-values | |
---|---|---|---|---|---|---|---|
gdpPercap | < 0.001 | ||||||
high | 87 (17.09%) | 178 (73.55%) | 137 (42.28%) | 263 (92.28%) | 20 (100%) | 685 (49.64%) | |
low | 422 (82.91%) | 64 (26.45%) | 187 (57.72%) | 22 (7.72%) | No Data | 695 (50.36%) | |
Missing | 58 | 21 | 33 | 34 | 2 | 148 | |
lifeExp | < 0.001 | ||||||
Mean (sd) | 48.82 (9.16) | 64.79 (9.2) | 60.35 (12.16) | 71.92 (5.4) | 73.97 (3.65) | 59.42 (13) | |
Median (Q1 - Q3) | 47.79 (42.37 - 54.41) | 67.05 (58.95 - 71.72) | 62.3 (51.5 - 70.2) | 72.19 (69.59 - 75.44) | 73.49 (71.1 - 76.33) | 60.77 (48.09 - 70.84) | |
Range | 23.6 - 76.44 | 37.58 - 80.65 | 28.8 - 82.6 | 43.59 - 81.76 | 69.12 - 81.23 | 23.6 - 82.6 | |
Missing | 55 | 24 | 39 | 36 | 1 | 155 | |
pop | < 0.001 | ||||||
Mean (sd) | 10.3 (16.59) | 22.35 (47.61) | 81.11 (218.87) | 16.85 (20.4) | 8.93 (6.4) | 30.5 (112.6) | |
Median (Q1 - Q3) | 4.57 (1.37 - 10.72) | 6.31 (3.03 - 17.23) | 14.62 (3.83 - 46.77) | 8.43 (4.32 - 21.07) | 8.69 (3.21 - 14.07) | 7.15 (2.79 - 19.77) | |
Range | 0.06 - 135.03 | 0.66 - 301.14 | 0.14 - 1318.68 | 0.15 - 82.4 | 1.99 - 20.43 | 0.06 - 1318.68 | |
Missing | 73 | 23 | 34 | 29 | 1 | 160 | |
year | 0.886 | ||||||
Mean (sd) | 1979.12 (17.26) | 1980.01 (17.17) | 1980.22 (17.24) | 1979.17 (16.94) | 1978.82 (17.22) | 1979.54 (17.15) | |
Median (Q1 - Q3) | 1977 (1962 - 1992) | 1982 (1967 - 1997) | 1982 (1967 - 1997) | 1977 (1967 - 1992) | 1979.5 (1963.25 - 1992) | 1982 (1967 - 1992) | |
Range | 1952 - 2007 | 1952 - 2007 | 1952 - 2007 | 1952 - 2007 | 1952 - 2007 | 1952 - 2007 | |
Missing | 56 | 29 | 32 | 33 | 0 | 150 |