py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.

Home Page:https://www.microsoft.com/en-us/research/project/alice/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`shap_values` for tree-based models doesn't set `check_additivity=False` as expected

jcreinhold opened this issue · comments

For, e.g., CausalForestDML, the desired behavior (#458) of EconML is to set check_additivity=False when computing shap_values (otherwise, the computation will frequently fail for reasons unbeknownst to me).

This kwarg is set here where, if the Explainer class is Tree, check_additivity is set to False.

This works with shap<=0.42.1; however, in shap>=0.43.0 the name of the class was changed to TreeExplainer.

Given that EconML states it's compatible with shap >= 0.38.1, < 0.44.0, this represents a bug.

Some potential solutions:

  1. Change the maximum version of shap to 0.44.0 to 0.42.1. (This doesn't seem desirable.)
  2. Change the code to:
# Simple, but potentially not future-proof.
if explainer.__class__.__name__ in {"Tree", "TreeExplainer"}:
    shap_out = explainer(F, check_additivity=False)
  1. Change to code to:
# This might have false positives but might be more future proof.
if "Tree" in explainer.__class__.__name__:
    shap_out = explainer(F, check_additivity=False)
  1. Change the code to:
import inspect

# This seems like the most robust option and will always be correct
if "check_additivity" in inspect.signature(explainer).parameters:
    shap_out = explainer(F, check_additivity=False)

(It seems that this parameter is only used in the TreeExplainer and the DeepExplainer, so the signature inspection would probably be fine, and we probably want to set this kwarg to False all the time regardless (at least in this package)).

Happy to submit a PR with whatever approach though.

Thanks for the feedback - I agree that option 4 seems like the best approach, and we'd be happy to take a PR fixing it. I am surprised that our existing tests in test_shap.py don't catch this (because they do cover CausalForestDML and they run against shap==0.43); please add a test that fails without your fix (or modify at least one of the existing tests to do so).

Also, we'd be happy to allow a higher upper bound (say shap<0.46) if there are not other breaking changes that are hard to work around; the only reason that we have a specific upper bound is that shap releases have had breaking changes that have broken functionality in the past.

@kbattocchi I'm not able to consistently reproduce this error. See #445 for another valiant but ultimately futile effort to reproduce it consistently.

For posterity, here is (essentially) the code I was running when I (occasionally) ran into the error:

Versions:

  • Python version: 3.8.17
  • econml version: 0.15.0
  • numpy version: 1.23.5
  • scikit-learn version: 1.3.2
  • scipy version: 1.10.1
  • shap version: 0.43.0
import econml
import numpy
import sklearn.ensemble as ens

seed = 1337
rng = np.random.default_rng(seed)

n = 1_000
n_x = 2

X = rng.uniform(-1.0, 1.0, size=(n, n_x))
A = rng.binomial(1, 0.5 + 0.5 * X[:, 0])
Y = rng.binomial(1, 0.3 + 0.5 * A + 0.2 * X[:, 0])

X_ = np.hstack([X, A[:, None]])
A_ = rng.binomial(1, 0.5, size=(n,))

model = dml.CausalForestDML(
    model_y=ens.RandomForestClassifier(random_state=seed),
    model_t=ens.RandomForestClassifier(random_state=seed),
    discrete_treatment=True,
    discrete_outcome=True,
    random_state=seed,
)

model.fit(Y, A_, X=X_)
shap_values = model.shap_values(X_)

Part of the problem with reproducing this error is that EconML doesn't currently allow you to pass a seed to the Explainer class.

I'd be happy to add in that keyword argument into the shap_values method; however, that'd involve changing a good number of files (e.g., the LinearCateEstimator and anything that overwrites that method).

This change isn't strictly necessary for this issue (although I believe it's generally desirable), and, even if I do find a configuration that triggers the error on my machine, I doubt it'll be reproducible across machines.

Any alternative ideas? I can quickly submit a PR with just the change proposed in option 4 and we could revisit adding the seed keyword argument to shap_values (if that's desirable).

Part of the problem with reproducing this error is that EconML doesn't currently allow you to pass a seed to the Explainer class.

I'd be happy to add in that keyword argument into the shap_values method; however, that'd involve changing a good number of files (e.g., the LinearCateEstimator and anything that overwrites that method).

This change isn't strictly necessary for this issue (although I believe it's generally desirable), and, even if I do find a configuration that triggers the error on my machine, I doubt it'll be reproducible across machines.

Any alternative ideas? I can quickly submit a PR with just the change proposed in option 4 and we could revisit adding the seed keyword argument to shap_values (if that's desirable).

I agree that allowing the seed to be passed as an optional argument seems beneficial, not just for testing but for reproducibility more broadly. If you don't mind adding these changes to your PR, that would be great, but if that's too much work I'm also happy to merge it as-is.