Error in TreeSHAPIQ and regression of the 'bike' dataset
hbaniecki opened this issue · comments
Hubert Baniecki commented
Why does a simple RF model on the 'bike' dataset now achieve 0.58 R^2 instead of previously 0.85 R^2?
TreeSHAPIQ returns an error in this example
import shapiq
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
X, y = shapiq.load_bike()
X_train, X_test, y_train, y_test = train_test_split(X.values, y.values, test_size=0.2, random_state=42)
n_features = X.shape[1]
model = RandomForestRegressor(n_estimators=1000, max_depth=5, max_features="sqrt", random_state=42)
model.fit(X_train, y_train)
print('Train R2: {:.4f}'.format(model.score(X_train, y_train)))
print('Val R2: {:.4f}'.format(model.score(X_test, y_test)))
explainer = shapiq.TreeExplainer(
model=model,
interaction_type="SII",
max_order=2,
min_order=1
)
x = X_test[0]
interaction_values = explainer.explain(x)
But seems to work on synthetic data
import shapiq
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier
X, y = make_classification(
n_samples=100,
n_features=7,
random_state=42,
n_classes=3,
n_informative=7,
n_repeated=0,
n_redundant=0,
)
model = DecisionTreeClassifier(random_state=42, max_depth=2)
model.fit(X, y)
explainer = shapiq.TreeExplainer(model, max_order=2, min_order=1, interaction_type="SII")
x_explain = X[0]
explanation = explainer.explain(x_explain)
explanation