Aggregated SHAP values for one hot encoded features are overestimated

Question

Aggregated SHAP values for one hot encoded features are overestimated

epetrovski opened this issue 5 months ago · comments

It seems that the ExplainerDashboard is summing the absolute SHAP values for all classes of a one hot encoded variable when generating the "Feature Importance" graph with grouped categorical variables.

For example, if the SHAP value for Sex_Male is 100, for Sex_Female it's 200, and for Sex_Other it's 500, the SHAP value is given as 800. However, if this category is treated as mutually exclusive, the sum does not make sense and a mean over non-zero values for all instances of Sex_* should be used instead. SHAP values are only additive if each observation can indeed assume all the values which is not the case with one hot encoded variables.

I discovered this after getting absurdly high SHAP values for a one hot encoded feature with 200+ categories in the dashboard - however that particular feature was only the 10th most important one, when I used permutation importance from sklearn.

The same is true in the example on this projects front page where the "Deck" category becomes hugely important when grouped - even more than PassengerClass - simply due to the fact that it has many categories. But passengers can only be assigned to a single deck so it makes little sense to look at the accumulated values.