matheusfacure / python-causality-handbook

Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and causality.

Home Page:https://matheusfacure.github.io/python-causality-handbook/landing-page.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue on page /22-Debiased-Orthogonal-Machine-Learning.html

SebKrantz opened this issue · comments

I fail to understand why in the section "Non-Scientific Double/Debiased ML" it is necessary to save the first stage models and predict with them. In adding counterfactual treatments, we are not changing any part of the covariates X which are the sole input to the first stage models. Thus the first-stage predictions are the same with or without counterfactual treatments and we don't need those models.

In addition, I don't quite understand the value of training and test splitting and the ensamble_pred() function here. If my goal is to get counterfactual predictions for all my data (which typically is the case), I would just use cross_val_predict() to get the first stage residuals (as in the section on DML) on the entire data, and then fit cross-validated final models using cv_estimate(), additionally saving the indices for each fold, and then create a predict method that uses the final-stage models and indices to create proper cross-validated final predictions for different price levels (subtracted their prediction from the first stage, which remains the same).