Change of results when using tb() in grouped freq()
Crismoc opened this issue · comments
After getting results from a grouped freq()
, I would like to put them in an object with tibble or data.frame format. When using tb()
the results are transformed in what might be unintended behavior:
library(summarytools)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
tobacco |>
group_by(smoker) |>
freq(diseased)
#> Frequencies
#> diseased
#> Type: Factor
#> Group: smoker = Yes
#>
#> Freq % Valid % Valid Cum. % Total % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#> Yes 125 41.95 41.95 41.95 41.95
#> No 173 58.05 100.00 58.05 100.00
#> <NA> 0 0.00 100.00
#> Total 298 100.00 100.00 100.00 100.00
#>
#> Group: smoker = No
#>
#> Freq % Valid % Valid Cum. % Total % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#> Yes 99 14.10 14.10 14.10 14.10
#> No 603 85.90 100.00 85.90 100.00
#> <NA> 0 0.00 100.00
#> Total 702 100.00 100.00 100.00 100.00
tobacco |>
group_by(smoker) |>
freq(diseased) |>
tb(na.rm = T)
#> # A tibble: 4 × 5
#> smoker diseased freq pct pct_cum
#> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 Yes Yes 125 21.0 21.0
#> 2 Yes No 173 29.0 50
#> 3 No Yes 99 7.05 57.1
#> 4 No No 603 42.9 100
Created on 2023-04-19 with reprex v2.0.2
Is there a way to transform the same results to a tibble or data.frame?
Could you pls show what would be the desired resulting df?
I would expect to get something like this:
library(summarytools)
library(dplyr)
tobacco |>
group_by(smoker) |>
reframe(
level = names(table(diseased)),
Freq = table(diseased),
`% Valid` = prop.table(table(diseased)))
#> # A tibble: 4 × 4
#> smoker level Freq `% Valid`
#> <fct> <chr> <table[1d]> <table[1d]>
#> 1 Yes Yes 125 0.4194631
#> 2 Yes No 173 0.5805369
#> 3 No Yes 99 0.1410256
#> 4 No No 603 0.8589744
I see what you mean. The proportions are recalculated to take into account both groups, and it can create confusion. Aside from better documenting this, I think an additional parameter is in order. That way the user can decide whether to recalculate proportions or not. Thank you for pointing it out.