how to recode different 'types' of missing in R from a .dta file

Question

how to recode different 'types' of missing in R from a .dta file

Laurent-Smeets-GSS-Account opened this issue 2 years ago · comments

Laurent-Smeets-GSS-Account commented 2 years ago

I have a Stata file (.dta) with different types of missing data (either because a question wasn't relevant for this person (.) or because a person didn't know the answer (.r). These differences matter for my analysis. I do not have access to Stata and would like to do this analysis in R. I looked at the {sjlabelled}, {labelled}, and {haven} packages, but cannot find a way to recode these different types of missing data.

The Stata command tab q2, m gives



               q2 |
xxxxxxxxxxxxxxxxx |
x sorry sensitive |
      xxxxxxxxxxx |
       xxxxxxxxxx |      Freq.     Percent        Cum.
------------------+-----------------------------------
               No |        342       14.43       14.43
              Yes |        673       28.40       42.83
                . |      1,234       52.07       94.89
               .r |        121        5.11      100.00
------------------+-----------------------------------
            Total |      2,370      100.00

However in R no distinction between . and .r

table(mydf$q2, useNA = "always")

gives


   0    1 <NA> 
 342  673 1355

However, R does recognise that there are different 'types' of missing (NA and NA(r))

sjlabelled::tidy_labels(mydf$q2)
<labelled<double>[2370]>: q2: xxxxx?
   [1]     1     NA   NA(r)     1     1    NA    NA    NA    NA    NA     0     0     1     1    NA     1    NA     1    NA    NA    NA    NA    NA     1    NA    NA    NA    NA(r)    NA    NA    NA(r)    NA

and

> get_labels(mydf$q2, values = "n", drop.na = FALSE)  
               -888                   0                   1 
"Unsure/Don’t Know"                "No"               "Yes"

How can I relabel the Unsure/Don’t Know category to a variable instead of a missing, while keeping the other missings actually missing?