Unexplained running out of RAM memory with ensemble voting classifier model
apavlo89 opened this issue · comments
Hello,
I am experiencing an issue with high RAM usage when running ExplainerDashboard for a VotingClassifier
ensemble model. The model is trained on a 2k dataset with 400 features in total however each classifier in voting classifier uses a subset of this feature set. My purpose is to understand how my algo algo makes predictions for a dataset in which i do not know the label outcome yet. Despite the dataset for explanation being relatively small (28 rows), the RAM usage spikes to over 51GB. Keep in mind I am running this in google collab. I suspect this might be related to how ExplainerDashboard handles ensemble models or the computation of SHAP values for such complex models or it might just be a bug. Below is a simplified version of my setup:
Model Setup
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
# Other imports...
# Define pipelines for individual models (example)
lr_pipeline = Pipeline([...])
xgb_pipeline = Pipeline([...])
# Other pipelines...
# VotingClassifier ensemble
eclf = VotingClassifier(
estimators=[
('lr', lr_pipeline),
('xgb', xgb_pipeline),
# Other models...
],
voting='soft'
)
eclf.fit(X_train, y_train)
from explainerdashboard import ClassifierExplainer, ExplainerDashboard
# Initialize the Explainer without target labels
explainer = ClassifierExplainer(eclf, future_predict, shap='kernel', model_output='probability')
# Create and run the dashboard
dashboard = ExplainerDashboard(explainer)
dashboard.run(port=8050)
ngrok_tunnel = ngrok.connect(8050)
print('Public URL:', ngrok_tunnel.public_url)
This is the output:
WARNING: For shap='kernel', shap interaction values can unfortunately not be calculated!
Note: shap values for shap='kernel' normally get calculated against X_background, but paramater X_background=None, so setting X_background=shap.sample(X, 50)...
Generating self.shap_explainer = shap.KernelExplainer(model, X, link='identity')
Building ExplainerDashboard..
Detected google colab environment, setting mode='external'
No y labels were passed to the Explainer, so setting model_summary=False...
For this type of model and model_output interactions don't work, so setting shap_interaction=False...
The explainer object has no decision_trees property. so setting decision_trees=False...
Generating layout...
Calculating shap values...
/usr/local/lib/python3.10/dist-packages/dash/dash.py:538: UserWarning:
JupyterDash is deprecated, use Dash instead.
See https://dash.plotly.com/dash-in-jupyter for more details.
In just a few seconds of running the code use exceeds RAM availability and crashes.