Unexplained running out of RAM memory with ensemble voting classifier model

Question

Unexplained running out of RAM memory with ensemble voting classifier model

apavlo89 opened this issue 4 months ago · comments

Hello,

I am experiencing an issue with high RAM usage when running ExplainerDashboard for a VotingClassifier ensemble model. The model is trained on a 2k dataset with 400 features in total however each classifier in voting classifier uses a subset of this feature set. My purpose is to understand how my algo algo makes predictions for a dataset in which i do not know the label outcome yet. Despite the dataset for explanation being relatively small (28 rows), the RAM usage spikes to over 51GB. Keep in mind I am running this in google collab. I suspect this might be related to how ExplainerDashboard handles ensemble models or the computation of SHAP values for such complex models or it might just be a bug. Below is a simplified version of my setup:

Model Setup

from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
# Other imports...

# Define pipelines for individual models (example)
lr_pipeline = Pipeline([...])
xgb_pipeline = Pipeline([...])
# Other pipelines...

# VotingClassifier ensemble
eclf = VotingClassifier(
    estimators=[
        ('lr', lr_pipeline),
        ('xgb', xgb_pipeline),
        # Other models...
    ],
    voting='soft'
)

eclf.fit(X_train, y_train)


from explainerdashboard import ClassifierExplainer, ExplainerDashboard

# Initialize the Explainer without target labels
explainer = ClassifierExplainer(eclf, future_predict, shap='kernel', model_output='probability')

# Create and run the dashboard
dashboard = ExplainerDashboard(explainer)
dashboard.run(port=8050)
ngrok_tunnel = ngrok.connect(8050)
print('Public URL:', ngrok_tunnel.public_url)

This is the output:

WARNING: For shap='kernel', shap interaction values can unfortunately not be calculated!
Note: shap values for shap='kernel' normally get calculated against X_background, but paramater X_background=None, so setting X_background=shap.sample(X, 50)...
Generating self.shap_explainer = shap.KernelExplainer(model, X, link='identity')
Building ExplainerDashboard..
Detected google colab environment, setting mode='external'
No y labels were passed to the Explainer, so setting model_summary=False...
For this type of model and model_output interactions don't work, so setting shap_interaction=False...
The explainer object has no decision_trees property. so setting decision_trees=False...
Generating layout...
Calculating shap values...
/usr/local/lib/python3.10/dist-packages/dash/dash.py:538: UserWarning:

JupyterDash is deprecated, use Dash instead.
See https://dash.plotly.com/dash-in-jupyter for more details.

In just a few seconds of running the code use exceeds RAM availability and crashes.