interpretml / ebm2onnx

A tool to convert EBM models to ONNX

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Converting sklearn pipelines with ebm models to onnx

ReneeErnst opened this issue · comments

  • ebm2onnx version: 3.1.1
  • Python version: 3.9.7
  • Operating System: ubuntu

Description

Feature request: Ability to use EBM models in Sklearn pipelines, and be able to convert that pipeline to ONNX. Would require some work to be able to register the model when using sklearn-onnx.

I want to save a sklearn pipeline that includes an EBM model to ONNX, rather than JUST the EBM model. This is a common use case where you want to pair your data processing with the model object in a pipeline. It does not appear that this functionality is included at this time.

Ideally ebm2onnx would have functionality to handle saving these pipelines that include ebm models to onnx. Example script that would ideally work below.

What I Did

import ebm2onnx
import pandas as pd
from interpret import glassbox
from sklearn import compose, impute, pipeline, preprocessing

features = [
    "feature_a",
    "feature_b",
    "feature_c",
    "feature_d",
    "feature_e",
    "feature_f",
    "feature_g",
]

df_train = pd.DataFrame(
    {
        "feature_a": [0, 0.5, 2, 5],
        "feature_b": [0, 0.5, 2, 5],
        "feature_c": [0, 0.5, 2, 5],
        "feature_d": [0, 0.5, 2, 5],
        "feature_e": [0, 1, 0, 1],
        "feature_f": [1, 0, 1, 0],
        "feature_g": ["a", "b", "can_not_determine", "can_not_determine"],
        "target": [1, 1, 0, 0],
    }
)
numeric_mean_transformer = pipeline.Pipeline(
    steps=[
        ("imputer", impute.SimpleImputer(strategy="mean")),
        ("scaler", preprocessing.StandardScaler()),
    ]
)

numeric_median_transformer = pipeline.Pipeline(
    steps=[
        ("imputer", impute.SimpleImputer(strategy="median")),
        ("scaler", preprocessing.StandardScaler()),
    ]
)

categorical_transformer = pipeline.Pipeline(
    steps=[
        (
            "onehot",
            preprocessing.OneHotEncoder(
                sparse=True,
                # Assumes I have 2 bool and 1 cat feature, and I'm specifying what
                # values I want to drop when one hot encoding.
                drop=list([0, 0, "can_not_determine"]),
                handle_unknown="ignore",
            ),
        )
    ]
)

preprocessor = compose.ColumnTransformer(
    transformers=[
        (
            "num_mean",
            numeric_mean_transformer,
            ["feature_a", "feature_b"],
        ),
        (
            "num_median",
            numeric_median_transformer,
            ["feature_c", "feature_d"],
        ),
        ("cat", categorical_transformer, ["feature_e", "feature_f", "feature_g"]),
    ]
)

my_pipeline = pipeline.Pipeline(
    [
        ("preprocessor", preprocessor),
        (
            "model",
            glassbox.ExplainableBoostingClassifier(
                max_bins=8,
                min_samples_leaf=2,
                max_leaves=2,
                learning_rate=0.5,
                validation_size=0.5,
                early_stopping_rounds=5,
                interactions=10,
                random_state=42,
            ),
        ),
    ]
)

my_pipeline.fit(df_train[features], df_train["target"])

onnx_pipeline = ebm2onnx.to_onnx(
    my_pipeline, ebm2onnx.get_dtype_from_pandas(df_train[features])
)

I may have jumped into this too fast, and will update if I get this working. I think I can register this converter and make it work, as documented here: https://onnx.ai/sklearn-onnx/pipeline.html

Yeah, looks like some additional work would be needed. It would be great to have this included in ebm2onnx.

Yes, this is something that would be great to have.
The conversion of the full skleanr pipeline cannot be done by emb2onnx. This converter is only for the ebm model.

I will look further in the sklearn documentation but according to the link you provided, we just need to register the ebm converter:
https://onnx.ai/sklearn-onnx/pipeline.html#new-converters-in-a-pipeline

Then, skl2onnx should be able to convert the whole pipeline including the ebm model.

That makes complete sense. After poking around a bit, I figured that it likely wouldn't end up in ebm2onnx, but instead need to be registered. Hopefully that's something that could happen - it would be an awesome add.