Confidence Interval for categorical outcome
ellpri opened this issue · comments
Hi @kbattocchi, keith, I am building a CausalForest where i have Treatment Variable which is multi categorical [0,1,2,3,5] and the outcome is [0,1], where 1 being severe.
econml_causalForest = CausalForestDML(model_y=RandomForestRegressor(random_state=42),
model_t=RandomForestClassifier(min_samples_leaf=10, random_state=42),
discrete_treatment=True, cv=3, random_state=123
)
econml_causalForest.fit(Y=y_train, T=T_train, X=X_train, W=None)
print(f'econml_ATE_forest: {econml_causalForest.ate(X_test, T0=0, T1=5)}')
print(econml_causalForest.summary())
print(econml_causalForest.ate_inference(X))
Got the results as follows
Doubly Robust ATE on Training Data Results
==============================================================
point_estimate stderr zstat pvalue ci_lower ci_upper
--------------------------------------------------------------
ATE|T0_1 0.128 0.02 6.402 0.0 0.089 0.167
ATE|T0_2 0.143 0.019 7.596 0.0 0.106 0.18
ATE|T0_3 0.164 0.02 8.35 0.0 0.126 0.203
ATE|T0_5 0.313 0.02 15.827 0.0 0.274 0.352
econml_ATE_forest: 0.27076799164408494
Uncertainty of Mean Point Estimate
===============================================================
mean_point stderr_mean zstat pvalue ci_mean_lower ci_mean_upper
---------------------------------------------------------------
0.109 1.059 0.103 0.918 -1.968 2.185
Distribution of Point Estimate
=========================================
std_point pct_point_lower pct_point_upper
-----------------------------------------
0.946 -0.263 0.233
Total Variance of Point Estimate
==========================================
stderr_point ci_point_lower ci_point_upper
------------------------------------------
1.421 -0.374 0.377
------------------------------------------
Which results should i take into consideration Doubly Robust or DoublML. Both ATE estimates are different ? And how should i intrepret the ATE and CI?
If you just care about the ATE on the training set, then use the doubly robust ATE (which you can get programmatically from the ate_
attribute). The ate()
method is more flexible, allowing you to also compute the ATE for other populations X, but it is not doubly-robust.
In terms of interpretation, a value of 0.313 means that increasing the probability of assigning an individual to treatment 5 instead of treatment 0 by p will increase the likelihood of a severe outcome by 0.313p. (This estimate is linear in the treatment probability which may not be completely realistic for a discrete outcome, since for some values of X we may have small variations in treatment that correspond to large variations in output, which would extrapolate to more than a 100% change in severity probability given a 100% change in treatment from one level to another, which is impossible)
@kbattocchi Hi Keith, Thanks for the reply. I am working with accident data. The treatment variable taken here is the relative velocity and '5' indicates more than 80kmph and '0' is 20kmph. The target variable is injury severity. I expected a result that would say if relative velocity changes from 0 to 5, it would increases the injury severity probability by x. But the way you intrepreted is little different.
- So the use case is not applicable here? In general, i want to analyse the parameters from the accident Database and its influence on injury severity which is a categorical variable.
- As you mentioned , ATE here is linear, so should i use Treatment Featurizer?
@ellpri I think that my answer is consistent with what you're looking for - changing 100% from '0' to '5' means changing the severity probability by 100% of 0.313, i.e. increasing it by 0.313. I only added the caveat because the linearity of the model is not necessarily completely realistic for discrete outcomes - we perform the estimate conditional on X by regressing the unexpected variation in outcome conditional on X and W to on the unexpected variation in treatment conditional on X and W, and empirically it's possible that for some X there was a big unexpected change in Y (there was a severe injury when we thought that was only 10% likely given X, say) but only a small unexpected change in T (the relative velocity was 5, and we thought that was 95% likely given X) - in that case it looks like a very small change in T leads to a big change in Y, which will extrapolate to a more than 100% change in outcome given a change in treatment from '0' to '5'. Despite this, empirically DML seems to generally perform well with discrete outcomes even though theoretically something like a "double machine learning for logistic regression" setup might be more appropriate.
The treatment featurizer won't affect this - you're already fitting a CATE model that is flexible in X (because you are using CausalForestDML), so featurizing X won't buy you anything - the linearity that I'm talking about is linearity in the treatment (probability). But discrete models are linear in the treatment without loss of generality.