py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Confidence Interval for categorical outcome

ellpri opened this issue · comments

Hi @kbattocchi, keith, I am building a CausalForest where i have Treatment Variable which is multi categorical [0,1,2,3,5] and the outcome is [0,1], where 1 being severe.

econml_causalForest = CausalForestDML(model_y=RandomForestRegressor(random_state=42),
                                  model_t=RandomForestClassifier(min_samples_leaf=10, random_state=42),
                                   discrete_treatment=True, cv=3, random_state=123
                                ), T=T_train, X=X_train, W=None)
print(f'econml_ATE_forest: {econml_causalForest.ate(X_test, T0=0, T1=5)}')


Got the results as follows

 Doubly Robust ATE on Training Data Results          
         point_estimate stderr zstat  pvalue ci_lower ci_upper
ATE|T0_1          0.128   0.02  6.402    0.0    0.089    0.167
ATE|T0_2          0.143  0.019  7.596    0.0    0.106     0.18
ATE|T0_3          0.164   0.02   8.35    0.0    0.126    0.203
ATE|T0_5          0.313   0.02 15.827    0.0    0.274    0.352

econml_ATE_forest: 0.27076799164408494
               Uncertainty of Mean Point Estimate              
mean_point stderr_mean zstat pvalue ci_mean_lower ci_mean_upper
     0.109       1.059 0.103  0.918        -1.968         2.185
      Distribution of Point Estimate     
std_point pct_point_lower pct_point_upper
    0.946          -0.263           0.233
     Total Variance of Point Estimate     
stderr_point ci_point_lower ci_point_upper
       1.421         -0.374          0.377

Which results should i take into consideration Doubly Robust or DoublML. Both ATE estimates are different ? And how should i intrepret the ATE and CI?

If you just care about the ATE on the training set, then use the doubly robust ATE (which you can get programmatically from the ate_ attribute). The ate() method is more flexible, allowing you to also compute the ATE for other populations X, but it is not doubly-robust.

In terms of interpretation, a value of 0.313 means that increasing the probability of assigning an individual to treatment 5 instead of treatment 0 by p will increase the likelihood of a severe outcome by 0.313p. (This estimate is linear in the treatment probability which may not be completely realistic for a discrete outcome, since for some values of X we may have small variations in treatment that correspond to large variations in output, which would extrapolate to more than a 100% change in severity probability given a 100% change in treatment from one level to another, which is impossible)

@kbattocchi Hi Keith, Thanks for the reply. I am working with accident data. The treatment variable taken here is the relative velocity and '5' indicates more than 80kmph and '0' is 20kmph. The target variable is injury severity. I expected a result that would say if relative velocity changes from 0 to 5, it would increases the injury severity probability by x. But the way you intrepreted is little different.

  1. So the use case is not applicable here? In general, i want to analyse the parameters from the accident Database and its influence on injury severity which is a categorical variable.
  2. As you mentioned , ATE here is linear, so should i use Treatment Featurizer?

@ellpri I think that my answer is consistent with what you're looking for - changing 100% from '0' to '5' means changing the severity probability by 100% of 0.313, i.e. increasing it by 0.313. I only added the caveat because the linearity of the model is not necessarily completely realistic for discrete outcomes - we perform the estimate conditional on X by regressing the unexpected variation in outcome conditional on X and W to on the unexpected variation in treatment conditional on X and W, and empirically it's possible that for some X there was a big unexpected change in Y (there was a severe injury when we thought that was only 10% likely given X, say) but only a small unexpected change in T (the relative velocity was 5, and we thought that was 95% likely given X) - in that case it looks like a very small change in T leads to a big change in Y, which will extrapolate to a more than 100% change in outcome given a change in treatment from '0' to '5'. Despite this, empirically DML seems to generally perform well with discrete outcomes even though theoretically something like a "double machine learning for logistic regression" setup might be more appropriate.

The treatment featurizer won't affect this - you're already fitting a CATE model that is flexible in X (because you are using CausalForestDML), so featurizing X won't buy you anything - the linearity that I'm talking about is linearity in the treatment (probability). But discrete models are linear in the treatment without loss of generality.