strengejacke / sjlabelled

Working with Labelled Data in R

Home Page:https://strengejacke.github.io/sjlabelled

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Retrieving original numeric values to make as_numeric(as_label(x)) consistent

marianschmidt opened this issue · comments

Labels based on named vectors are very handy to convert numeric values reliably to factors using the correctly named factor levels. However, at the moment I do not find a way to retrieve the original numeric values of labels when trying to convert a labelled factor back to a numeric value.
I always thought the use.labels = TRUE option does this, but found out that I just misunderstood functionality.
Is there a way to create an exact reverse of the as_label() function, so that x and as_numeric(as_label(x)) are the same?

Example code:


library(sjlabelled)

#creating dataframe x and setting labels
x<-data.frame(a = c(0,1,0))

x$a <- set_labels(x$a, labels = c(`null` = 0, `one` = 1))

get_labels(x$a)

as_numeric(as_label(x$a))
x$a

#option use.labels fails in this example because labels are non-numeric
as_numeric(as_label(x$a), use.labels = TRUE)

The problem is that as_label() drops all information on label-value-associations. This would work, if you add the labels back again.

f <- as_label(x$a)
f <- set_labels(f, labels = c(null = 0, one = 1))
as_numeric(f, use.labels = TRUE)

I could make as_label() add the values as labels, than it should work w/o the set_labels() in between.

Great, I did not know that. Yes, I would say as_label() should have either i) an option keep.labels = or ii) have an equivalent function that automatically keeps labels or iii) keep labels by default (however, I don't know if this might break code for other users - on the other hand it would be consistent with as_numeric() where keep.labels = TRUE by default).

And I guess use.labels = TRUE works as I originally understood it using the right-hand numeric part of the value labels. Maybe you can improve the documentation in this paragraph:

Logical, if TRUE and x has numeric value labels, these value labels will be set as numeric values.

to something along the lines

if TRUE and x has numeric value labels, the values defined in the labels (right-hand side of labels = c(null = 0, one = 1)) will be set as numeric values (instead of consecutive factor level numbers)

use.labels = TRUE does not exactly work here, because values and value names are reversed for label attributes. But it now work in this way:

library(sjlabelled)
x <- data.frame(a = c(0, 1, 0))
x$a <- set_labels(x$a, labels = c(`null` = 0, `one` = 1))
as_numeric(as_label(x$a))
#> [1] 1 2 1
#> attr(,"labels")
#> null  one 
#>    1    2

Created on 2019-03-03 by the reprex package (v0.2.1)

For me this behavior is very counterintuitive and might be dangerous, esp. when using labelled data for later calculations. I would expect as_numeric(as_label(x$a, keep.labels = TRUE)) to not change the value labels automatically. After defining "null" = 0, I would expect "null" to stay 0 and not to become 1.

library(sjlabelled)
x <- data.frame(a = c(0, 1, 0))
x$a <- set_labels(x$a, labels = c(`null` = 0, `one` = 1))
as_numeric(as_label(x$a, keep.labels = TRUE))
#> [1] 1 2 1
#> attr(,"labels")
#> null  one 
#>    1    2

From the current documentation, I thought as_numeric(as_label(x$a, keep.labels = TRUE), use.labels = TRUE) would work, but it's causing an error, probably due reversing value names and values as you mentioned.

The problem can be demonstrated with this example:

library(sjlabelled)
x <- factor(c("None", "Little", "Some", "Lots"))
x <- set_labels(
  x,
  labels = c(None = "0.5", Little = "1.3", Some = "1.8", Lots = ".2")
)
str(x)
#>  Factor w/ 4 levels "Little","Lots",..: 3 1 4 2
#>  - attr(*, "labels")= Named num [1:4] 0.2 0.5 1.3 1.8
#>   ..- attr(*, "names")= chr [1:4] "Lots" "None" "Little" "Some"
get_labels(x)
#> [1] "Lots"   "None"   "Little" "Some"

y = c(0, 1, 0)
y <- set_labels(y, labels = c(`null` = 0, `one` = 1))
y <- as_label(y, keep.labels = TRUE)
str(y)
#>  Factor w/ 2 levels "null","one": 1 2 1
#>  - attr(*, "labels")= Named chr [1:2] "null" "one"
#>   ..- attr(*, "names")= chr [1:2] "0" "1"
get_labels(y)
#> [1] "0" "1"

This is the "typical" behaviour. But I see that it's probably unintended here, so I might make an exception for as_label(), so as_numeric(as_label()) would work.

Do you think it would make sense to always store the labels as a named numerical vector, independent of the actual data type of the object? So that:

y = c(0, 1, 0)
y <- set_labels(y, labels = c(`null` = 0, `one` = 1))
y <- as_label(y, keep.labels = TRUE)
str(y)
#>  Factor w/ 2 levels "null","one": 1 2 1
#>  - attr(*, "labels")= Named num [1:2] 0 1 
#>   ..- attr(*, "names")= chr [1:2] "null" "one"

Or would that require a lot of rewriting your code?

I have revised the function, it should now be consistent with the labelled-vector structure:

s1 <- c(0, 1, 4)
s1 <- sjlabelled::set_labels(s1, labels = c(null = 0, one = 1, four = 4))
str(s1)
#>  num [1:3] 0 1 4
#>  - attr(*, "labels")= Named num [1:3] 0 1 4
#>   ..- attr(*, "names")= chr [1:3] "null" "one" "four"

s1 <- c(0, 1, 4)
s1 <- sjlabelled::set_labels(s1, labels = c("null", "one", "four"))
str(s1)
#>  num [1:3] 0 1 4
#>  - attr(*, "labels")= Named num [1:3] 0 1 4
#>   ..- attr(*, "names")= chr [1:3] "null" "one" "four"

s2 <- haven::labelled(c(0, 1, 4), c(null = 0, one = 1, four = 4))
str(s2)
#>  'haven_labelled' num [1:3] 0 1 4
#>  - attr(*, "labels")= Named num [1:3] 0 1 4
#>   ..- attr(*, "names")= chr [1:3] "null" "one" "four"

s3 <- sjlabelled::as_label(s1, keep.labels = TRUE)
str(s3)
#>  Factor w/ 3 levels "null","one","four": 1 2 3
#>  - attr(*, "labels")= Named num [1:3] 0 1 4
#>   ..- attr(*, "names")= chr [1:3] "null" "one" "four"

s4 <- sjlabelled::as_label(s2, keep.labels = TRUE)
str(s4)
#>  Factor w/ 3 levels "null","one","four": 1 2 3
#>  - attr(*, "labels")= Named num [1:3] 0 1 4
#>   ..- attr(*, "names")= chr [1:3] "null" "one" "four"


sjlabelled::as_numeric(sjlabelled::as_label(s2, keep.labels = TRUE))
#> [1] 1 2 3
#> attr(,"labels")
#> null  one four 
#>    1    2    3
sjlabelled::as_numeric(sjlabelled::as_label(s1, keep.labels = TRUE))
#> [1] 1 2 3
#> attr(,"labels")
#> null  one four 
#>    1    2    3


sjlabelled::as_numeric(sjlabelled::as_label(s2, keep.labels = TRUE), use.labels = TRUE)
#> [1] 0 1 4
#> attr(,"labels")
#> null  one four 
#>    0    1    4
sjlabelled::as_numeric(sjlabelled::as_label(s1, keep.labels = TRUE), use.labels = TRUE)
#> [1] 0 1 4
#> attr(,"labels")
#> null  one four 
#>    0    1    4

will commit later this day.