uber / causalml

Uplift modeling and causal inference with machine learning algorithms

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to deal with observational data?

NHUV opened this issue · comments

Hello. I would like to create an uplift model to prioritize the best customers to contact. Since there is observational data available, I prefer to go that way as it's less time consuming than setting up an experiment. Are there any suggestions on how to deal with observational data (e. g. in order to adhere to the unconfoundedness assumption)? I am thinking about incorporating the following methods:

  1. Propensity score matching. However, I do wonder if this is already applied under the hood when using an x-learner for example since I saw that there is a propensity score generated in case the user didn't provide this. Can you elaborate here? I didn't find an implementation example for psm in the notebooks, is it correct that such example is not available?
  2. As (29. Shortreed, S.M., Ertefaie, A.: Outcome-adaptive lasso: variable selection for causal
    inference. Biometrics 73(4), 1111–1122 (2017)) states: "Feature selection algorithms for observational causal inference, such as the lasso-based approach proposed by [29], are designed to help models whose goal is to reduce confounding". Is there a reason this method is not incorporated in the causalML package? Does a filter feature selection method suffice in case we apply matching (as touched upon in 1.).

Really looking forward to your recommendations for developing an uplift model with observational data.

Thank you!

In observational causal inference, the most important step is that of forming a clear understanding of the possible confounding variables for the causal relationship that you are trying to measure. As things stand, this can only be done by qualitatively reasoning about the specific problem that you're trying to solve, ideally with other people who are also knowledgeable of the problem. Causal ML or any other current software packages can't help you with this.

Once you've defined your set of confounding variables, you can use any of the variety of estimation methods out there. The most common one is a simple linear multiple regression with the confounders as covariates. You can use statsmodels, DoWhy, etc. The methods implemented in Causal ML (like X-learner, R-learner) will also work, but they're most likely an overkill.