benbhansen-stats / propertee

Prognostic Regression Offsets with Propagation of ERrors, for Treatment Effect Estimation (IES R305D210029).

Home Page:https://benbhansen-stats.github.io/propertee/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

warning in when block() in rd_design(): damod_mm longer than w

kirkvanacore opened this issue · comments

@adamSales and I received a warning when requesting the summary of a limit object produced by a blocked rd design. The circumstances are detailed below:

When running this code:

des <- rd_design(Z ~ forcing(R) + unitid(id) + block(problem_id), data=ad[ad$R > -1 & ad$R < 11, ]) 
m1_bw2<-glm(Y ~ R + Z, data = ad[ad$R > -1 & ad$R < 11, ], family = binomial)
res_BW2_1 <- lmitt(Y~1,design=des,offset=cov_adj(m1_bw2), weights = "ate", data=ad[ad$R > -1 & ad$R < 11, ])
summary(res_BW2_1)

...we receive this warning along with the summary:

Warning message:
In damod_mm[msk, , drop = FALSE] * w :
longer object length is not a multiple of shorter object length

  • The error does not occur without the block(problem_id)
  • We suspect that this warring is refereeing to a misalignment between the dimensions of damod_mm and the length of the weights

I suspect @jwasserman2 may be the better person to help debug this, but do you have missing data? This has been something that's come up occasionally, especially with Adam's real data, that we didn't account for properly.

There are no missing data for the variables used in this example.

commented

It would be great if you could share either the data. Can you, @kirkvanacore ? Or, better yet, a stripped-down, privacy preserving, data usage ageement compliant version of it that still manifests the warning?

Here is synthetic data set that produces the error:

synth_dat_issue131.csv

Thanks for posting the data @kirkvanacore.ate() is returning NA's, causing rows to be dropped. In .get_a21(), this results in w, which is x$weights, to be of a shorter length than damod_mm, which has been created by passing na.pass to model.frame():

> nrow(ad[ad$R > -1 & ad$R < 11, ])
[1] 12283
> nrow(model.frame(res_BW2_1))
[1] 11930
> sum(is.na(ate(des, data = ad[ad$R > -1 & ad$R < 11, ])))
[1] 353
> 12283 - 11930
[1] 353
commented

Thanks, all. I wonder what characterizes the blocks with NA weights?

Blocks are numeric through 252, but only 226 exist - e.g. there is no block 41 or 42. When we expand e_z (the block-level ratio of #treated/total num) to the observation level, we use e_z[blocks(design)[, 1]] (R/weights.R#L132). However, if for example we're looking at block 245, e_z[245] returns the 245rd element of a 226-length vector, NA. What we want is e_z["245"], to return the named entry in the vector.

Solution could be as easy as e_z[as.character(blocks(design)[,1])], but I don't have time right now to test it. I'll try and get to it this afternoon if no one else does.

Fix was as easy as expected. @kirkvanacore I no longer get the warning with the synthetic data; please test with your real data and let me know.

commented

When you do get to testing this w/ the real data, @kirkvanacore, please also check whether the lmitt(<...>, absorb=T) problem has been fixed as well. Josh E suspects that 5ffed0d4 will have taken care of it.

commented

Hi @kirkvanacore could you check against the real data and verify that you no longer get a warning (or other sign of trouble)? If not, this issue can be closed.

@benthestatistician @josherrickson My apologies for the delay. I no longer receive the error when running lmitt(<...>, absorb=T) against the real data.

commented

Thanks Kirk!