Shap value with Random Forest reports positive contribution 0.111 for the features that are not used in splitting.
adallak opened this issue · comments
Aramayis Dallakyan commented
If I run the following code using DRF with a maximum depth equal to 1 and ntrees = 1 so only one feature is used for splitting.
However, for the specific row, the SHAP values for the features that were not used for splitting have a positive contribution, which is a violation of SHAP theoretical properties. On the other hand, if I run the same configurations with gradient boosting, the results correspond to expectation, i.e, not used features have contribution 0.
import h2o
from h2o.estimators import H2OGradientBoostingEstimator
from h2o.estimators import H2ORandomForestEstimator
h2o.init()
# Import the prostate dataset into H2O:
prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv")
# Set the predictors and response; set the factors:
prostate["CAPSULE"] = prostate["CAPSULE"].asfactor()
predictors = ["ID","AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"]
response = "CAPSULE"
# Build and train the model using RF:
pros_gbm = H2ORandomForestEstimator(ntrees=1,
max_depth= 1,
seed=1111)
pros_gbm.shap_explain_row_plot(frame = prostate, row_index= 3)
GBM implementation
# Build and train the model using GBM:
pros_gbm = H2ORandomForestEstimator(ntrees=1,
max_depth= 1,
seed=1111)
pros_gbm.shap_explain_row_plot(frame = prostate, row_index= 3)
pros_gbm = H2OGradientBoostingEstimator(ntrees=1,
max_depth= 1,
seed=1111)
pros_gbm.train(x=predictors, y=response, training_frame=prostate)
Kazuho Oku commented
I believe you are referring to a different h2o? https://github.com/h2oai