Several issues with sjlabelled (missing value labels) and other sjverse packages
cschwem2er opened this issue · comments
Hi Daniel,
while using the sjverse packages for teaching data analysis this term, we noticed some issues with current versions, starting with missing values labels after reading in a Stata dataset. I included all issues in a markdownfile, which is available here.
As for the issue specific related to sjlabelled, it seems as if value labels get lost somehow, although they definitely exist in the original Stata dataset:
library(tidyverse)
library(sjPlot)
library(sjmisc)
library(sjlabelled)
library(sjstats)
setwd('C:/Dropbox/lehre/Methoden der politischen Soziologie/0 Daten/GLES Vorwahl')
d <- read_stata("GLES_Vorwahlquerschnitt_ZA5700_v1-0-0.dta")
frq(d$q11bb)
# Beabsichtigte Stimmabgabe: Zweitstimme (Version B) (x) <numeric>
# total N=2001 valid N=2001 mean=-18.09 sd=70.99
val frq raw.prc valid.prc cum.prc
-99 109 5.45 5.45 5.45
-98 176 8.80 8.80 14.24
-97 313 15.64 15.64 29.89
-83 12 0.60 0.60 30.48
1 556 27.79 27.79 58.27
4 376 18.79 18.79 77.06
5 79 3.95 3.95 81.01
6 161 8.05 8.05 89.06
7 144 7.20 7.20 96.25
171 2 0.10 0.10 96.35
180 4 0.20 0.20 96.55
206 5 0.25 0.25 96.80
209 3 0.15 0.15 96.95
215 31 1.55 1.55 98.50
225 1 0.05 0.05 98.55
237 2 0.10 0.10 98.65
322 27 1.35 1.35 100.00
<NA> 0 0.00 NA NA
The dataset for reproduction and a sessionInfo()
output are available in the rmarkdown file linked above.
- Problem 1) Might be this issue: tidyverse/haven#359
- Problem 2) Can't reproduce. As I have revised
descr()
, it might be, that this issue no longer exists in the dev-version of sjmisc - Problem 3) I think this is due to a wrong call to
grpmean()
. The function requires two variables, the numeric (for mean) and the categorical (for groups). You just defined the first variable. This example works for me:
d %>%
group_by(q1) %>%
grpmean(q62, q3)
Thanks for investigating.
- Problem 1: really seems to be a haven issue, we tried the earlier haven version 1.1.0 and did not experience any issues with missing value labels. I really hope the haven guys will fix this soon, as this also affects many sjverse users.
- Problem 2: does occur with the current CRAN version of sjmisc, but not with the dev-version. Could you please consider pushing this to CRAN soon? As grouping and descriptives are very common procedures, I think this is an important bug fix.
- Problem 3: I know that grpmean is supposed to take two variables as input. My idea was that it automatically detects when a grouped object is handed over and then uses the grouping variable properly, such that only one additional variable needs to be defined. But this is really just a minor "nice to have idea" and not important.
Feel free to close this issue :)
The idea behind grouping and grpmeans() is that you can compute "grouped means" for subgroups of a data frame. So grouping data frames would not make sense if I use this group structure as "grouping variable" for grpmean()
, or am I confusing something here?
Publishing the next round of updates to CRAN is planned, will occur due to the next week, I think.
The idea behind grouping and grpmeans() is that you can compute "grouped means" for subgroups of a data frame. So grouping data frames would not make sense if I use this group structure as "grouping variable" for grpmean(), or am I confusing something here?
No sorry, you are right. My use case was really just the mean of a variable for each group in a grouped dataframe, not means of subgroups of an already grouped dataframe. Combining both in one function would probably not be a good idea.