strengejacke / sjlabelled

Working with Labelled Data in R

Home Page:https://strengejacke.github.io/sjlabelled

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Duplicate value labels in `tab_stackfreq` when using `sjlabelled::as_label(keep.labels = TRUE)`

dannyparsons opened this issue · comments

tab_stackfreq is really nice for displaying rated data.

However, I found a problem when using this with factor columns which have retained their labels. When using tab_stackfreq in this case, the value labels appear twice in the columns of the table.

For example, this works fine initially.

library(sjlabelled)
library(sjPlot)

likert_4 <- data.frame(
  q1 = sample(1:4, 500, replace = TRUE, prob = c(0.2, 0.3, 0.1, 0.4)),
  q2 = sample(1:4, 500, replace = TRUE, prob = c(0.5, 0.25, 0.15, 0.1)),
  q3 = sample(1:4, 500, replace = TRUE, prob = c(0.25, 0.1, 0.4, 0.25))
)
labs <- c("Independent" = 1, "Slightly dependent" = 2,
          "Dependent" = 3, "Severely dependent" = 4)

likert_4$q1 <- sjlabelled::add_labels(likert_4$q1, labels = labs)
likert_4$q2 <- sjlabelled::add_labels(likert_4$q2, labels = labs)
likert_4$q3 <- sjlabelled::add_labels(likert_4$q3, labels = labs)

sjPlot::tab_stackfrq(items = likert_4)
# 	Independent	Slightly dependent	Dependent	Severely dependent
# q1	20.20 %	31.00 %	9.80 %	39.00 %
# q2	49.40 %	25.20 %	17.00 %	8.40 %
# q3	29.40 %	9.20 %	35.00 %	26.40 %

But if the labelled numeric columns are then converted to factor while retaining the labels, then the table gives duplicates for each value with zero frequencies for the duplicates.

likert_4$q1 <- sjlabelled::as_label(likert_4$q1, keep.labels = TRUE)
likert_4$q2 <- sjlabelled::as_label(likert_4$q1, keep.labels = TRUE)
likert_4$q3 <- sjlabelled::as_label(likert_4$q1, keep.labels = TRUE)

sjPlot::tab_stackfrq(items = likert_4)
#  	Independent	Slightly dependent	Dependent	Severely dependent	Dependent	Independent	Severely dependent	Slightly dependent
# q1	20.20 %	31.00 %	9.80 %	39.00 %	0.00 %	0.00 %	0.00 %	0.00 %
# q2	20.20 %	31.00 %	9.80 %	39.00 %	0.00 %	0.00 %	0.00 %	0.00 %
# q3	20.20 %	31.00 %	9.80 %	39.00 %	0.00 %	0.00 %	0.00 %	0.00 %

I often find it important to keep the labels to allow me to convert back to numeric in the same way.

The problem seems to originate from the call to sjlabelled::get_labels() here which returns the duplicates i.e.

sjlabelled::get_labels(
  likert_4$q1,
  attr.only = F,
  values = "n",
  non.labelled = T
)
#                    1                    2                    3                    4            Dependent          Independent   Severely dependent   Slightly dependent 
#        "Independent" "Slightly dependent"          "Dependent" "Severely dependent"          "Dependent"        "Independent" "Severely dependent" "Slightly dependent" 

Maybe there could be a separate case here for factors with labels? If non.labelled = T then it returns the correct labels, but I guess this change may affect other cases?
It would be great if tab_stackfreq could work with labelled factor columns as it's really useful to keep the labels when you are often switching between numeric and factor.
Many thanks!

Thanks, minimal reprex here:

library(sjlabelled)

q1 <- sample(1:4, 500, replace = TRUE, prob = c(0.2, 0.3, 0.1, 0.4))

labs <- c("Independent" = 1, "Slightly dependent" = 2,
          "Dependent" = 3, "Severely dependent" = 4)

q1 <- sjlabelled::add_labels(q1, labels = labs)
q1 <- sjlabelled::as_label(q1, keep.labels = TRUE)

sjlabelled::get_labels(
  q1,
  attr.only = F,
  values = "n",
  non.labelled = T
)
#>                    1                    2                    3 
#>        "Independent" "Slightly dependent"          "Dependent" 
#>                    4            Dependent          Independent 
#> "Severely dependent"          "Dependent"        "Independent" 
#>   Severely dependent   Slightly dependent 
#> "Severely dependent" "Slightly dependent"

Created on 2021-11-26 by the reprex package (v2.0.1)