Shap value with Random Forest reports positive contribution 0.111 for the features that are not used in splitting.

Question

Shap value with Random Forest reports positive contribution 0.111 for the features that are not used in splitting.

adallak opened this issue a year ago · comments

If I run the following code using DRF with a maximum depth equal to 1 and ntrees = 1 so only one feature is used for splitting.
However, for the specific row, the SHAP values for the features that were not used for splitting have a positive contribution, which is a violation of SHAP theoretical properties. On the other hand, if I run the same configurations with gradient boosting, the results correspond to expectation, i.e, not used features have contribution 0.

import h2o
from h2o.estimators import H2OGradientBoostingEstimator
from h2o.estimators import H2ORandomForestEstimator
h2o.init()

# Import the prostate dataset into H2O:
prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv")

# Set the predictors and response; set the factors:
prostate["CAPSULE"] = prostate["CAPSULE"].asfactor()
predictors = ["ID","AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"]
response = "CAPSULE"

# Build and train the model using RF:
pros_gbm = H2ORandomForestEstimator(ntrees=1,
                                        max_depth= 1,
                                        seed=1111)

pros_gbm.shap_explain_row_plot(frame = prostate, row_index= 3)

GBM implementation

# Build and train the model using GBM:
pros_gbm = H2ORandomForestEstimator(ntrees=1,
                                        max_depth= 1,
                                        seed=1111)

pros_gbm.shap_explain_row_plot(frame = prostate, row_index= 3)


pros_gbm = H2OGradientBoostingEstimator(ntrees=1,
                                        max_depth= 1,
                                        seed=1111)
pros_gbm.train(x=predictors, y=response, training_frame=prostate)

Kazuho Oku · Answer 1 · Tue Dec 13 2022 08:24:26 GMT+0800 (China Standard Time)

I believe you are referring to a different h2o? https://github.com/h2oai