different behavior for character vs. numeric data, when using named vector for labels argument of set_labels()?
jmobrien opened this issue · comments
Using 1.1.4.
The documentation for set_labels suggests that the format for a named vector being passed to the labels
argument is:
c([desiredlabel1] = [datavalue1], ..., [desiredlabel{n}] = [datavalue{n}])
e.g. from Examples to set_labels()
:
# assign labels with named vector
dummy <- sample(1:4, 40, replace = TRUE)
dummy <- set_labels(dummy, labels = c("very low" = 1, "very high" = 4))
This implies that the actual data values to be labelled are the elements of the labels
, and the labels to be applied are the names of those elements. However, in practice, this gets reversed when the thing being labeled is a character vector.
See below:
# numeric version
numvec <-
set_labels(1:4,
labels = c(a = 1, b = 2, c = 3, d = 4)
)
numvec
[1] 1 2 3 4
attr(,"labels")
a b c d <-- labels are from the names attributes of vector passed to "labels"
1 2 3 4 <-- data values that are labelled come from elements vector passed to "labels"
get_labels(numvec)
# character version
charvec <-
set_labels(c("one", "two", "three", "four"),
labels = c(a = "one", b = "two", c = "three", d = "four")
)
charvec
[1] "one" "two" "three" "four"
attr(,"labels")
one two three four <-- here, labels come from the *elements* of vector passed to "labels"
"a" "b" "c" "d" <-- meanwhile, data values that are labelled come from the *names* of vector passed to "labels"
when this is done with character vectors, get_labels()
then produces the wrong values for "labels", providing the values in the data instead:
# This is fine:
get_labels(numvec)
[1] "a" "b" "c" "d"
# This is not:
get_labels(charvec)
[1] "one" "two" "three" "four" <-- again, these are the values, not the labels
Is this a mistake, or is there something about intended behavior I'm not understanding?
It's showing up as an issue for me b/c I have a situation where the labels are generally serving multiple roles for Stata compatibility and helping merge datasets, but in a few cases I also want to provide metadata about more complex classification to my userbase, e.g.:
set_labels(c("TT", "CC", "TC", "CT", "TX", "XT", "CX", "XC", "XX"),
labels = c(Treatment = "TT", Control = "CC",
Mixed = "CT", Mixed = "TC",
PartialTreatment = "TX", PartialTreatment = "XT",
PartialControl = "TX", PartialControl = "XT",
Missing = "XX")
)
Just following up about this after a while. I'd like to use the tool more in my workflow, but as it stands I'm just having to set things up manually.
I still don't understand the seemingly inconsistent behavior when labeling numeric vs. character data types. Looking at the code this behavior appears to be intentional, with the linked section doing the flip if things match (i.e., string data is given string labels).
If there's a mismatch because the labels are given numeric elements with string names, it ignores those and throws a warning, like so:
example <- sample(c("one", "two", "three", "four"), 40, replace = TRUE)
example <- set_labels(dummy, labels = c("one" = 1 , "two" = 2, "three" = 3 , "four" = 3))
example # unlabelled
However, a labels
argument structured like the above would already be improper based on the guidance in the help docs for set_labels
.
In fact, it even looks like a similar mistake would be fixed if it were made when labeling a numeric vector, i.e. here
Is there a purpose to this behavior I'm not understanding? Perhaps something to do with Haven, etc.?
I think there must have been a reason to do this, but I can't remember. I changed the behaviour, so it is now in line with the default haven behaviour:
charvec <- sjlabelled::set_labels(
c("one", "two", "three", "four"),
labels = c(a = "one", b = "two", c = "three", d = "four")
)
charvec2 <- haven::labelled(
c("one", "two", "three", "four"),
labels = c(a = "one", b = "two", c = "three", d = "four")
)
charvec
#> [1] "one" "two" "three" "four"
#> attr(,"labels")
#> a b c d
#> "one" "two" "three" "four"
charvec2
#> <labelled<character>[4]>
#> [1] one two three four
#>
#> Labels:
#> value label
#> one a
#> two b
#> three c
#> four d
sjlabelled::get_labels(charvec, value = "p")
#> [1] "[four] d" "[one] a" "[three] c" "[two] b"
sjlabelled::get_labels(charvec2, value = "p")
#> [1] "[four] d" "[one] a" "[three] c" "[two] b"
Created on 2021-05-11 by the reprex package (v2.0.0)
Excellent, thanks for your help on this.
And thanks for all your work on this package overall!
Just a small follow-up on this--Is there a plan to release a new version of the package CRAN any time soon? Just asking so I can plan--much of my work is done on a managed server, and the policy is generally only to use official CRAN packages.
I just submitted an update to CRAN, and this resolved issue should be in the official CRAN release.