Using sim_df with missing data goes bork
doomlab opened this issue · comments
So, if I use sim_df
with missing data, it does not like that. You can find my example data midterm
in my learnSTATS package here: https://github.com/doomlab/learnSTATS/tree/master/data
Then I was trying to do this:
midterm <- sim_df(data = midterm, #data frame
n = sample(50:100, 1),#how many of each group
between = c("JOL_group", "type_cue"))
I think the solution may be to include na.rm
in this section ... if that makes sense for simulating. I ended up using na.omit
on my data for what I was doing, then your messy
function to add the missingness back in.
z <- rnorm_multi(
n = n,
vars = ncol(y),
mu = sapply(y, mean, na.rm = T),
sd = sapply(y, sd, na.rm = T),
r = cor(y),
varnames = names(y),
empirical = empirical
)
That's fixed now in the version on github. I have some tests, but would be grateful if you could test it on your own data and let me know if it makes sense. You can now add missingness with the same joint probabilities as your own data.
I had to calculate the r with use = "complete.obs" because the pairwise version produces matrices that can be not positive definite.
Thanks friend! Worked like a charm (and made sense).