Retrieving the Original Values from Sklearn Pipeline
woochan-jang opened this issue · comments
woochan-jang commented
Hello,
I'm trying to incorporate sklearn pipelines into Explainerdashboard, as below:
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from explainerdashboard import ExplainerDashboard, ClassifierExplainer
from explainerdashboard.dashboard_components import *
from explainerdashboard.custom import *
from explainerdashboard.datasets import titanic_survive, feature_descriptions
X_train, y_train, X_test, y_test = titanic_survive()
model = RandomForestClassifier(n_estimators=50, max_depth=5) # .fit(X_train, y_train)
pipeline = make_pipeline(StandardScaler(), model)
pipeline.fit(X_train, y_train)
explainer = ClassifierExplainer(
pipeline, # model,
X_test,
y_test,
# cats=["Sex", "Deck", "Embarked"],
labels=["Not Survived", "Survived"],
descriptions=feature_descriptions,
)
ExplainerDashboard(
explainer,
tabs=[
IndividualPredictionsComposite,
],
).run(port=9050, debug=True)
I expected to see the pre-scaled data in the dashboard (e.g. sex_male=0 or 1). However, it seems the values I see on the dashboard are the data that has gone through the StandardScalar step (e.g. sex_male=0.7, 1.3).
Is there any way to achieve my goal?
Thank you very much for an incredible open source work!
woochan-jang commented
Sorry - I found the comment doc. shap='kernel' option takes forever though, so any idea regarding how to expedite that would be awesome.