uber / causalml

Uplift modeling and causal inference with machine learning algorithms

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

T-Learner ATE, SE calculations

ras44 opened this issue · comments

commented

Describe the bug
Less of a bug than a question:

The T-learner takes the mean of the treatment effect te which is calculated over all subjects (so the mean over all rows of differences between each treatment group's model prediction and the control model prediction):

for i, group in enumerate(self.t_groups):
_ate = te[:, i].mean()

However, the standard errors of the ATE are calculated relative to a filtered subset- only the subjects that are within a particular treatment group and those in the control group are included:

se = np.sqrt(
(
(y_filt[w == 0] - yhat_c[w == 0]).var() / (1 - prob_treatment)
+ (y_filt[w == 1] - yhat_t[w == 1]).var() / prob_treatment
+ (yhat_t - yhat_c).var()
)
/ y_filt.shape[0]
)

It seems like the subjects in the ATE calculation should match the subjects in the SE calculation, with the SE potentially simply just being the SE of the te measurements for all subjects, if all subjects are meant to be included in the calculation.

If all subjects are not included in the ATE calculation and the ATE calculation is group-specific, then it seems like we should have:

_ate  = (yhat_t - yhat_c).mean()

And again the SE simply being the SE of the series:

se = np.sqrt((yhat_t - yhat_c).var())