strengejacke / sjmisc

Data transformation and utility functions for R

Home Page:https://strengejacke.github.io/sjmisc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect group names with multiple groups in frq()

jmbarajas opened this issue · comments

I'm trying to calculate frequency tables of a grouped tibble using two grouping variables that have two levels each. The output from frq() seems to be both mislabeling and incompletely labeling the output. See the following example:

library(tibble)
library(dplyr)
library(sjmisc)

set.seed(1001)

df <- tibble(group1 = factor(c(rep(1, 50), rep(2, 50)),  
                             labels = c("Group A", "Group B")), 
             group2 = factor(rep(1:2, 50), labels = c("Group X", "Group Y")), 
             values = factor(as.integer(runif(100, 1, 6)), 
                             labels = c("Never", "Once per month",  
                                        "Twice per month",  
                                        "Once per week", "Once per day")))

df %>% group_by(group1, group2) %>% frq(values)

#> 
#> Grouped by:
#> group1: Group A
#> group2: Group X
#>  
#> # values <categorical> 
#> # total N=25  valid N=25  mean=3.16  sd=1.46
#>  
#>              val frq raw.prc valid.prc cum.prc
#>            Never   5      20        20      20
#>   Once per month   3      12        12      32
#>  Twice per month   6      24        24      56
#>    Once per week   5      20        20      76
#>     Once per day   6      24        24     100
#>             <NA>   0       0        NA      NA
#> 
#> Grouped by:
#> group1: Group B
#> group2: Group Y
#>  
#> # values <categorical> 
#> # total N=25  valid N=25  mean=2.68  sd=1.52
#>  
#>              val frq raw.prc valid.prc cum.prc
#>            Never   8      32        32      32
#>   Once per month   5      20        20      52
#>  Twice per month   3      12        12      64
#>    Once per week   5      20        20      84
#>     Once per day   4      16        16     100
#>             <NA>   0       0        NA      NA
#> 
#> Grouped by:
#> group1: NA
#> group2: NA
#>  
#> # values <categorical> 
#> # total N=25  valid N=25  mean=2.96  sd=1.37
#>  
#>              val frq raw.prc valid.prc cum.prc
#>            Never   5      20        20      20
#>   Once per month   4      16        16      36
#>  Twice per month   7      28        28      64
#>    Once per week   5      20        20      84
#>     Once per day   4      16        16     100
#>             <NA>   0       0        NA      NA
#> 
#> Grouped by:
#> group1: NA
#> group2: NA
#>  
#> # values <categorical> 
#> # total N=25  valid N=25  mean=3.04  sd=1.43
#>  
#>              val frq raw.prc valid.prc cum.prc
#>            Never   5      20        20      20
#>   Once per month   4      16        16      36
#>  Twice per month   6      24        24      60
#>    Once per week   5      20        20      80
#>     Once per day   5      20        20     100
#>             <NA>   0       0        NA      NA

Note that the grouping labels only consist of the first levels of group1 and group2 together or the second levels together (i.e. the diagonal of a 2x2 crosstab). When I compare the output to xtabs, we see that the second frequency table is also mislabeled: it should be the summary of Group A and Group Y.

xtabs(~values + group2 + group1, df)
#> , , group1 = Group A
#> 
#>                  group2
#> values            Group X Group Y
#>   Never                 5       8
#>   Once per month        3       5
#>   Twice per month       6       3
#>   Once per week         5       5
#>   Once per day          6       4
#> 
#> , , group1 = Group B
#> 
#>                  group2
#> values            Group X Group Y
#>   Never                 5       5
#>   Once per month        4       4
#>   Twice per month       7       6
#>   Once per week         5       5
#>   Once per day          4       5

I'm using the latest CRAN release of sjmisc (v 2.7.9).

Oops: I see this was solved in the latest development build. Closing the issue.