Duplicate value labels in `tab_stackfreq` when using `sjlabelled::as_label(keep.labels = TRUE)`
dannyparsons opened this issue · comments
tab_stackfreq
is really nice for displaying rated data.
However, I found a problem when using this with factor columns which have retained their labels. When using tab_stackfreq
in this case, the value labels appear twice in the columns of the table.
For example, this works fine initially.
library(sjlabelled)
library(sjPlot)
likert_4 <- data.frame(
q1 = sample(1:4, 500, replace = TRUE, prob = c(0.2, 0.3, 0.1, 0.4)),
q2 = sample(1:4, 500, replace = TRUE, prob = c(0.5, 0.25, 0.15, 0.1)),
q3 = sample(1:4, 500, replace = TRUE, prob = c(0.25, 0.1, 0.4, 0.25))
)
labs <- c("Independent" = 1, "Slightly dependent" = 2,
"Dependent" = 3, "Severely dependent" = 4)
likert_4$q1 <- sjlabelled::add_labels(likert_4$q1, labels = labs)
likert_4$q2 <- sjlabelled::add_labels(likert_4$q2, labels = labs)
likert_4$q3 <- sjlabelled::add_labels(likert_4$q3, labels = labs)
sjPlot::tab_stackfrq(items = likert_4)
# Independent Slightly dependent Dependent Severely dependent
# q1 20.20 % 31.00 % 9.80 % 39.00 %
# q2 49.40 % 25.20 % 17.00 % 8.40 %
# q3 29.40 % 9.20 % 35.00 % 26.40 %
But if the labelled numeric columns are then converted to factor while retaining the labels, then the table gives duplicates for each value with zero frequencies for the duplicates.
likert_4$q1 <- sjlabelled::as_label(likert_4$q1, keep.labels = TRUE)
likert_4$q2 <- sjlabelled::as_label(likert_4$q1, keep.labels = TRUE)
likert_4$q3 <- sjlabelled::as_label(likert_4$q1, keep.labels = TRUE)
sjPlot::tab_stackfrq(items = likert_4)
# Independent Slightly dependent Dependent Severely dependent Dependent Independent Severely dependent Slightly dependent
# q1 20.20 % 31.00 % 9.80 % 39.00 % 0.00 % 0.00 % 0.00 % 0.00 %
# q2 20.20 % 31.00 % 9.80 % 39.00 % 0.00 % 0.00 % 0.00 % 0.00 %
# q3 20.20 % 31.00 % 9.80 % 39.00 % 0.00 % 0.00 % 0.00 % 0.00 %
I often find it important to keep the labels to allow me to convert back to numeric in the same way.
The problem seems to originate from the call to sjlabelled::get_labels()
here which returns the duplicates i.e.
sjlabelled::get_labels(
likert_4$q1,
attr.only = F,
values = "n",
non.labelled = T
)
# 1 2 3 4 Dependent Independent Severely dependent Slightly dependent
# "Independent" "Slightly dependent" "Dependent" "Severely dependent" "Dependent" "Independent" "Severely dependent" "Slightly dependent"
Maybe there could be a separate case here for factors with labels? If non.labelled = T
then it returns the correct labels, but I guess this change may affect other cases?
It would be great if tab_stackfreq
could work with labelled factor columns as it's really useful to keep the labels when you are often switching between numeric and factor.
Many thanks!
Thanks, minimal reprex here:
library(sjlabelled)
q1 <- sample(1:4, 500, replace = TRUE, prob = c(0.2, 0.3, 0.1, 0.4))
labs <- c("Independent" = 1, "Slightly dependent" = 2,
"Dependent" = 3, "Severely dependent" = 4)
q1 <- sjlabelled::add_labels(q1, labels = labs)
q1 <- sjlabelled::as_label(q1, keep.labels = TRUE)
sjlabelled::get_labels(
q1,
attr.only = F,
values = "n",
non.labelled = T
)
#> 1 2 3
#> "Independent" "Slightly dependent" "Dependent"
#> 4 Dependent Independent
#> "Severely dependent" "Dependent" "Independent"
#> Severely dependent Slightly dependent
#> "Severely dependent" "Slightly dependent"
Created on 2021-11-26 by the reprex package (v2.0.1)