Confidence Interval for categorical outcome

Question

Confidence Interval for categorical outcome

ellpri opened this issue 4 months ago · comments

Hi @kbattocchi, keith, I am building a CausalForest where i have Treatment Variable which is multi categorical [0,1,2,3,5] and the outcome is [0,1], where 1 being severe.

econml_causalForest = CausalForestDML(model_y=RandomForestRegressor(random_state=42),
                                  model_t=RandomForestClassifier(min_samples_leaf=10, random_state=42),
                                   discrete_treatment=True, cv=3, random_state=123
                                )
econml_causalForest.fit(Y=y_train, T=T_train, X=X_train, W=None)
print(f'econml_ATE_forest: {econml_causalForest.ate(X_test, T0=0, T1=5)}')

print(econml_causalForest.summary())
print(econml_causalForest.ate_inference(X))

Got the results as follows

 Doubly Robust ATE on Training Data Results          
==============================================================
         point_estimate stderr zstat  pvalue ci_lower ci_upper
--------------------------------------------------------------
ATE|T0_1          0.128   0.02  6.402    0.0    0.089    0.167
ATE|T0_2          0.143  0.019  7.596    0.0    0.106     0.18
ATE|T0_3          0.164   0.02   8.35    0.0    0.126    0.203
ATE|T0_5          0.313   0.02 15.827    0.0    0.274    0.352


econml_ATE_forest: 0.27076799164408494
               Uncertainty of Mean Point Estimate              
===============================================================
mean_point stderr_mean zstat pvalue ci_mean_lower ci_mean_upper
---------------------------------------------------------------
     0.109       1.059 0.103  0.918        -1.968         2.185
      Distribution of Point Estimate     
=========================================
std_point pct_point_lower pct_point_upper
-----------------------------------------
    0.946          -0.263           0.233
     Total Variance of Point Estimate     
==========================================
stderr_point ci_point_lower ci_point_upper
------------------------------------------
       1.421         -0.374          0.377
------------------------------------------

Which results should i take into consideration Doubly Robust or DoublML. Both ATE estimates are different ? And how should i intrepret the ATE and CI?

Keith Battocchi · Answer 1 · Thu Mar 21 2024 22:17:18 GMT+0800 (China Standard Time)

If you just care about the ATE on the training set, then use the doubly robust ATE (which you can get programmatically from the ate_ attribute). The ate() method is more flexible, allowing you to also compute the ATE for other populations X, but it is not doubly-robust.

In terms of interpretation, a value of 0.313 means that increasing the probability of assigning an individual to treatment 5 instead of treatment 0 by p will increase the likelihood of a severe outcome by 0.313p. (This estimate is linear in the treatment probability which may not be completely realistic for a discrete outcome, since for some values of X we may have small variations in treatment that correspond to large variations in output, which would extrapolate to more than a 100% change in severity probability given a 100% change in treatment from one level to another, which is impossible)

ellpri · Answer 2 · Mon Mar 25 2024 16:46:33 GMT+0800 (China Standard Time)

@kbattocchi Hi Keith, Thanks for the reply. I am working with accident data. The treatment variable taken here is the relative velocity and '5' indicates more than 80kmph and '0' is 20kmph. The target variable is injury severity. I expected a result that would say if relative velocity changes from 0 to 5, it would increases the injury severity probability by x. But the way you intrepreted is little different.

So the use case is not applicable here? In general, i want to analyse the parameters from the accident Database and its influence on injury severity which is a categorical variable.
As you mentioned , ATE here is linear, so should i use Treatment Featurizer?

Keith Battocchi · Answer 3 · Tue Mar 26 2024 02:05:46 GMT+0800 (China Standard Time)

@ellpri I think that my answer is consistent with what you're looking for - changing 100% from '0' to '5' means changing the severity probability by 100% of 0.313, i.e. increasing it by 0.313. I only added the caveat because the linearity of the model is not necessarily completely realistic for discrete outcomes - we perform the estimate conditional on X by regressing the unexpected variation in outcome conditional on X and W to on the unexpected variation in treatment conditional on X and W, and empirically it's possible that for some X there was a big unexpected change in Y (there was a severe injury when we thought that was only 10% likely given X, say) but only a small unexpected change in T (the relative velocity was 5, and we thought that was 95% likely given X) - in that case it looks like a very small change in T leads to a big change in Y, which will extrapolate to a more than 100% change in outcome given a change in treatment from '0' to '5'. Despite this, empirically DML seems to generally perform well with discrete outcomes even though theoretically something like a "double machine learning for logistic regression" setup might be more appropriate.

The treatment featurizer won't affect this - you're already fitting a CATE model that is flexible in X (because you are using CausalForestDML), so featurizing X won't buy you anything - the linearity that I'm talking about is linearity in the treatment (probability). But discrete models are linear in the treatment without loss of generality.