Retrieving original numeric values to make as_numeric(as_label(x)) consistent
marianschmidt opened this issue · comments
Labels based on named vectors are very handy to convert numeric values reliably to factors using the correctly named factor levels. However, at the moment I do not find a way to retrieve the original numeric values of labels when trying to convert a labelled factor back to a numeric value.
I always thought the use.labels = TRUE
option does this, but found out that I just misunderstood functionality.
Is there a way to create an exact reverse of the as_label()
function, so that x and as_numeric(as_label(x))
are the same?
Example code:
library(sjlabelled)
#creating dataframe x and setting labels
x<-data.frame(a = c(0,1,0))
x$a <- set_labels(x$a, labels = c(`null` = 0, `one` = 1))
get_labels(x$a)
as_numeric(as_label(x$a))
x$a
#option use.labels fails in this example because labels are non-numeric
as_numeric(as_label(x$a), use.labels = TRUE)
The problem is that as_label()
drops all information on label-value-associations. This would work, if you add the labels back again.
f <- as_label(x$a)
f <- set_labels(f, labels = c(null = 0, one = 1))
as_numeric(f, use.labels = TRUE)
I could make as_label()
add the values as labels, than it should work w/o the set_labels()
in between.
Great, I did not know that. Yes, I would say as_label()
should have either i) an option keep.labels =
or ii) have an equivalent function that automatically keeps labels or iii) keep labels by default (however, I don't know if this might break code for other users - on the other hand it would be consistent with as_numeric()
where keep.labels = TRUE
by default).
And I guess use.labels = TRUE
works as I originally understood it using the right-hand numeric part of the value labels. Maybe you can improve the documentation in this paragraph:
Logical, if TRUE and x has numeric value labels, these value labels will be set as numeric values.
to something along the lines
if TRUE and x has numeric value labels, the values defined in the labels (right-hand side of labels = c(null = 0, one = 1)) will be set as numeric values (instead of consecutive factor level numbers)
use.labels = TRUE
does not exactly work here, because values and value names are reversed for label attributes. But it now work in this way:
library(sjlabelled)
x <- data.frame(a = c(0, 1, 0))
x$a <- set_labels(x$a, labels = c(`null` = 0, `one` = 1))
as_numeric(as_label(x$a))
#> [1] 1 2 1
#> attr(,"labels")
#> null one
#> 1 2
Created on 2019-03-03 by the reprex package (v0.2.1)
For me this behavior is very counterintuitive and might be dangerous, esp. when using labelled data for later calculations. I would expect as_numeric(as_label(x$a, keep.labels = TRUE))
to not change the value labels automatically. After defining "null" = 0, I would expect "null" to stay 0 and not to become 1.
library(sjlabelled)
x <- data.frame(a = c(0, 1, 0))
x$a <- set_labels(x$a, labels = c(`null` = 0, `one` = 1))
as_numeric(as_label(x$a, keep.labels = TRUE))
#> [1] 1 2 1
#> attr(,"labels")
#> null one
#> 1 2
From the current documentation, I thought as_numeric(as_label(x$a, keep.labels = TRUE), use.labels = TRUE)
would work, but it's causing an error, probably due reversing value names and values as you mentioned.
The problem can be demonstrated with this example:
library(sjlabelled)
x <- factor(c("None", "Little", "Some", "Lots"))
x <- set_labels(
x,
labels = c(None = "0.5", Little = "1.3", Some = "1.8", Lots = ".2")
)
str(x)
#> Factor w/ 4 levels "Little","Lots",..: 3 1 4 2
#> - attr(*, "labels")= Named num [1:4] 0.2 0.5 1.3 1.8
#> ..- attr(*, "names")= chr [1:4] "Lots" "None" "Little" "Some"
get_labels(x)
#> [1] "Lots" "None" "Little" "Some"
y = c(0, 1, 0)
y <- set_labels(y, labels = c(`null` = 0, `one` = 1))
y <- as_label(y, keep.labels = TRUE)
str(y)
#> Factor w/ 2 levels "null","one": 1 2 1
#> - attr(*, "labels")= Named chr [1:2] "null" "one"
#> ..- attr(*, "names")= chr [1:2] "0" "1"
get_labels(y)
#> [1] "0" "1"
This is the "typical" behaviour. But I see that it's probably unintended here, so I might make an exception for as_label()
, so as_numeric(as_label())
would work.
Do you think it would make sense to always store the labels as a named numerical vector, independent of the actual data type of the object? So that:
y = c(0, 1, 0)
y <- set_labels(y, labels = c(`null` = 0, `one` = 1))
y <- as_label(y, keep.labels = TRUE)
str(y)
#> Factor w/ 2 levels "null","one": 1 2 1
#> - attr(*, "labels")= Named num [1:2] 0 1
#> ..- attr(*, "names")= chr [1:2] "null" "one"
Or would that require a lot of rewriting your code?
I have revised the function, it should now be consistent with the labelled-vector structure:
s1 <- c(0, 1, 4)
s1 <- sjlabelled::set_labels(s1, labels = c(null = 0, one = 1, four = 4))
str(s1)
#> num [1:3] 0 1 4
#> - attr(*, "labels")= Named num [1:3] 0 1 4
#> ..- attr(*, "names")= chr [1:3] "null" "one" "four"
s1 <- c(0, 1, 4)
s1 <- sjlabelled::set_labels(s1, labels = c("null", "one", "four"))
str(s1)
#> num [1:3] 0 1 4
#> - attr(*, "labels")= Named num [1:3] 0 1 4
#> ..- attr(*, "names")= chr [1:3] "null" "one" "four"
s2 <- haven::labelled(c(0, 1, 4), c(null = 0, one = 1, four = 4))
str(s2)
#> 'haven_labelled' num [1:3] 0 1 4
#> - attr(*, "labels")= Named num [1:3] 0 1 4
#> ..- attr(*, "names")= chr [1:3] "null" "one" "four"
s3 <- sjlabelled::as_label(s1, keep.labels = TRUE)
str(s3)
#> Factor w/ 3 levels "null","one","four": 1 2 3
#> - attr(*, "labels")= Named num [1:3] 0 1 4
#> ..- attr(*, "names")= chr [1:3] "null" "one" "four"
s4 <- sjlabelled::as_label(s2, keep.labels = TRUE)
str(s4)
#> Factor w/ 3 levels "null","one","four": 1 2 3
#> - attr(*, "labels")= Named num [1:3] 0 1 4
#> ..- attr(*, "names")= chr [1:3] "null" "one" "four"
sjlabelled::as_numeric(sjlabelled::as_label(s2, keep.labels = TRUE))
#> [1] 1 2 3
#> attr(,"labels")
#> null one four
#> 1 2 3
sjlabelled::as_numeric(sjlabelled::as_label(s1, keep.labels = TRUE))
#> [1] 1 2 3
#> attr(,"labels")
#> null one four
#> 1 2 3
sjlabelled::as_numeric(sjlabelled::as_label(s2, keep.labels = TRUE), use.labels = TRUE)
#> [1] 0 1 4
#> attr(,"labels")
#> null one four
#> 0 1 4
sjlabelled::as_numeric(sjlabelled::as_label(s1, keep.labels = TRUE), use.labels = TRUE)
#> [1] 0 1 4
#> attr(,"labels")
#> null one four
#> 0 1 4
will commit later this day.