mmschlk / shapiq

Shapley Interactions for Machine Learning

Home Page:https://shapiq.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in TreeSHAPIQ and regression of the 'bike' dataset

hbaniecki opened this issue · comments

Why does a simple RF model on the 'bike' dataset now achieve 0.58 R^2 instead of previously 0.85 R^2?

TreeSHAPIQ returns an error in this example

import shapiq
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

X, y = shapiq.load_bike()
X_train, X_test, y_train, y_test = train_test_split(X.values, y.values, test_size=0.2, random_state=42)
n_features = X.shape[1]

model = RandomForestRegressor(n_estimators=1000, max_depth=5, max_features="sqrt", random_state=42)
model.fit(X_train, y_train)
print('Train R2: {:.4f}'.format(model.score(X_train, y_train)))
print('Val R2: {:.4f}'.format(model.score(X_test, y_test)))

explainer = shapiq.TreeExplainer(
    model=model,
    interaction_type="SII",
    max_order=2,
    min_order=1
)
x = X_test[0]
interaction_values = explainer.explain(x)

But seems to work on synthetic data

import shapiq
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier

X, y = make_classification(
    n_samples=100,
    n_features=7,
    random_state=42,
    n_classes=3,
    n_informative=7,
    n_repeated=0,
    n_redundant=0,
)
model = DecisionTreeClassifier(random_state=42, max_depth=2)
model.fit(X, y)

explainer = shapiq.TreeExplainer(model, max_order=2, min_order=1, interaction_type="SII")

x_explain = X[0]
explanation = explainer.explain(x_explain)
explanation