Retrieving the Original Values from Sklearn Pipeline

Question

Retrieving the Original Values from Sklearn Pipeline

woochan-jang opened this issue 2 years ago · comments

Hello,

I'm trying to incorporate sklearn pipelines into Explainerdashboard, as below:

from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

from explainerdashboard import ExplainerDashboard, ClassifierExplainer
from explainerdashboard.dashboard_components import *
from explainerdashboard.custom import *
from explainerdashboard.datasets import titanic_survive, feature_descriptions


X_train, y_train, X_test, y_test = titanic_survive()
model = RandomForestClassifier(n_estimators=50, max_depth=5) # .fit(X_train, y_train)
pipeline = make_pipeline(StandardScaler(), model)
pipeline.fit(X_train, y_train)

explainer = ClassifierExplainer(
    pipeline, # model,
    X_test,
    y_test,
    # cats=["Sex", "Deck", "Embarked"],
    labels=["Not Survived", "Survived"],
    descriptions=feature_descriptions,
)

ExplainerDashboard(
    explainer,
    tabs=[
        IndividualPredictionsComposite,
    ],
).run(port=9050, debug=True)

I expected to see the pre-scaled data in the dashboard (e.g. sex_male=0 or 1). However, it seems the values I see on the dashboard are the data that has gone through the StandardScalar step (e.g. sex_male=0.7, 1.3).

Is there any way to achieve my goal?

Thank you very much for an incredible open source work!

woochan-jang · Answer 1 · Wed Nov 30 2022 18:26:16 GMT+0800 (China Standard Time)

Sorry - I found the comment doc. shap='kernel' option takes forever though, so any idea regarding how to expedite that would be awesome.