h2o / h2o

H2O - the optimized HTTP/1, HTTP/2, HTTP/3 server

Home Page:https://h2o.examp1e.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Shap value with Random Forest reports positive contribution 0.111 for the features that are not used in splitting.

adallak opened this issue · comments

If I run the following code using DRF with a maximum depth equal to 1 and ntrees = 1 so only one feature is used for splitting.
However, for the specific row, the SHAP values for the features that were not used for splitting have a positive contribution, which is a violation of SHAP theoretical properties. On the other hand, if I run the same configurations with gradient boosting, the results correspond to expectation, i.e, not used features have contribution 0.

import h2o
from h2o.estimators import H2OGradientBoostingEstimator
from h2o.estimators import H2ORandomForestEstimator
h2o.init()

# Import the prostate dataset into H2O:
prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv")

# Set the predictors and response; set the factors:
prostate["CAPSULE"] = prostate["CAPSULE"].asfactor()
predictors = ["ID","AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"]
response = "CAPSULE"

# Build and train the model using RF:
pros_gbm = H2ORandomForestEstimator(ntrees=1,
                                        max_depth= 1,
                                        seed=1111)

pros_gbm.shap_explain_row_plot(frame = prostate, row_index= 3)

GBM implementation

# Build and train the model using GBM:
pros_gbm = H2ORandomForestEstimator(ntrees=1,
                                        max_depth= 1,
                                        seed=1111)

pros_gbm.shap_explain_row_plot(frame = prostate, row_index= 3)


pros_gbm = H2OGradientBoostingEstimator(ntrees=1,
                                        max_depth= 1,
                                        seed=1111)
pros_gbm.train(x=predictors, y=response, training_frame=prostate)

I believe you are referring to a different h2o? https://github.com/h2oai