debruine / faux

R functions for simulating factorial datasets

Home Page:https://debruine.github.io/faux/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using sim_df with missing data goes bork

doomlab opened this issue · comments

So, if I use sim_df with missing data, it does not like that. You can find my example data midterm in my learnSTATS package here: https://github.com/doomlab/learnSTATS/tree/master/data

Then I was trying to do this:

    midterm <- sim_df(data = midterm, #data frame
                      n = sample(50:100, 1),#how many of each group
                      between = c("JOL_group", "type_cue"))

I think the solution may be to include na.rm in this section ... if that makes sense for simulating. I ended up using na.omit on my data for what I was doing, then your messy function to add the missingness back in.

z <- rnorm_multi(
      n = n,
      vars = ncol(y),
      mu = sapply(y, mean, na.rm = T),
      sd = sapply(y, sd, na.rm = T),
      r = cor(y),
      varnames = names(y),
      empirical = empirical
    )

That's fixed now in the version on github. I have some tests, but would be grateful if you could test it on your own data and let me know if it makes sense. You can now add missingness with the same joint probabilities as your own data.

I had to calculate the r with use = "complete.obs" because the pairwise version produces matrices that can be not positive definite.

Thanks friend! Worked like a charm (and made sense).