how to recode different 'types' of missing in R from a .dta file
Laurent-Smeets-GSS-Account opened this issue · comments
I have a Stata file (.dta) with different types of missing data (either because a question wasn't relevant for this person (.
) or because a person didn't know the answer (.r
). These differences matter for my analysis. I do not have access to Stata and would like to do this analysis in R. I looked at the {sjlabelled}
, {labelled}
, and {haven}
packages, but cannot find a way to recode these different types of missing data.
The Stata command tab q2, m
gives
q2 |
xxxxxxxxxxxxxxxxx |
x sorry sensitive |
xxxxxxxxxxx |
xxxxxxxxxx | Freq. Percent Cum.
------------------+-----------------------------------
No | 342 14.43 14.43
Yes | 673 28.40 42.83
. | 1,234 52.07 94.89
.r | 121 5.11 100.00
------------------+-----------------------------------
Total | 2,370 100.00
However in R no distinction between .
and .r
table(mydf$q2, useNA = "always")
gives
0 1 <NA>
342 673 1355
However, R does recognise that there are different 'types' of missing (NA
and NA(r)
)
sjlabelled::tidy_labels(mydf$q2)
<labelled<double>[2370]>: q2: xxxxx?
[1] 1 NA NA(r) 1 1 NA NA NA NA NA 0 0 1 1 NA 1 NA 1 NA NA NA NA NA 1 NA NA NA NA(r) NA NA NA(r) NA
and
> get_labels(mydf$q2, values = "n", drop.na = FALSE)
-888 0 1
"Unsure/Don’t Know" "No" "Yes"
How can I relabel the Unsure/Don’t Know
category to a variable instead of a missing, while keeping the other missings actually missing?