dcomtois / summarytools

R Package to Quickly and Neatly Summarize Data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Change of results when using tb() in grouped freq()

Crismoc opened this issue · comments

After getting results from a grouped freq(), I would like to put them in an object with tibble or data.frame format. When using tb() the results are transformed in what might be unintended behavior:

library(summarytools)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tobacco |> 
  group_by(smoker) |> 
  freq(diseased)
#> Frequencies  
#> diseased  
#> Type: Factor  
#> Group: smoker = Yes  
#> 
#>               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#>         Yes    125     41.95          41.95     41.95          41.95
#>          No    173     58.05         100.00     58.05         100.00
#>        <NA>      0                               0.00         100.00
#>       Total    298    100.00         100.00    100.00         100.00
#> 
#> Group: smoker = No  
#> 
#>               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#>         Yes     99     14.10          14.10     14.10          14.10
#>          No    603     85.90         100.00     85.90         100.00
#>        <NA>      0                               0.00         100.00
#>       Total    702    100.00         100.00    100.00         100.00

tobacco |> 
  group_by(smoker) |> 
  freq(diseased) |> 
  tb(na.rm = T)
#> # A tibble: 4 × 5
#>   smoker diseased  freq   pct pct_cum
#>   <fct>  <fct>    <dbl> <dbl>   <dbl>
#> 1 Yes    Yes        125 21.0     21.0
#> 2 Yes    No         173 29.0     50  
#> 3 No     Yes         99  7.05    57.1
#> 4 No     No         603 42.9    100

Created on 2023-04-19 with reprex v2.0.2

Is there a way to transform the same results to a tibble or data.frame?

Could you pls show what would be the desired resulting df?

I would expect to get something like this:

library(summarytools)
library(dplyr)

tobacco |> 
  group_by(smoker) |> 
  reframe(
    level = names(table(diseased)),
    Freq = table(diseased),
    `% Valid` = prop.table(table(diseased)))
#> # A tibble: 4 × 4
#>   smoker level Freq        `% Valid`  
#>   <fct>  <chr> <table[1d]> <table[1d]>
#> 1 Yes    Yes   125         0.4194631  
#> 2 Yes    No    173         0.5805369  
#> 3 No     Yes    99         0.1410256  
#> 4 No     No    603         0.8589744

I see what you mean. The proportions are recalculated to take into account both groups, and it can create confusion. Aside from better documenting this, I think an additional parameter is in order. That way the user can decide whether to recalculate proportions or not. Thank you for pointing it out.