interpretml / ebm2onnx

A tool to convert EBM models to ONNX

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

contributions mismatch for nominal features

lboussengui opened this issue · comments

  • ebm2onnx version: 3.1.1
  • onnxruntime : 1.16.1
  • interpret : 0.4.2
  • Python version: 3.10.8
  • Operating System: MacOS

Description

I trained an EBM classification model. This model was initially saved in pickle format.

I used ebm2onnx as shown below to convert my model to the .onnx format.

I noticed that the contribution to the prediction for a test case is different for nominal type features when passing in onnx format; the contributions are set to zero.

Do you have an explanation for this ?

What I Did

import ebm2onnx
import pickle
import onnxruntime as rt

# load first EBM 
with open(f'{MODEL_PATH}ebm_first.pkl', 'rb') as f:
    ebm_first  = pickle.load(f)

# load dtypes saved during model training 
with open(f'{MODEL_PATH}training_dtypes_for_onnx.pkl', 'rb') as f:
    training_dtypes_for_onnx  = pickle.load(f)

# transform ebm to onnx 
onnx_model = ebm2onnx.to_onnx(
    model=ebm_first,
    predict_proba=True,  # Generate a dedicated output for probabilities
    explain=True,  # Generate a dedicated output for local explanations
    dtype=training_dtypes_for_onnx,
    name='DEFAULT',
)

Here are the result of local explanation with EBM pickle model for one example :

pred_pkl = ebm_first.explain_local(X_test, y_test)
pred_pkl.data(0)['scores']

result is

array([ 0.027,  0.416, -0.158,  0.388,  0.043,  0.   , -0.196,  0.051,
       -0.201, -0.032,  0.176,  0.151,  0.   ,  0.216,  0.2  ,  0.376,
        0.05 ,  0.022, -0.076,  0.028, -0.26 , -0.043,  0.173,  0.269,
       -0.203, -0.025,  0.037, -0.056,  0.164,  0.296,  0.089,  0.08 ,
        0.1  ,  0.098, -0.018, -0.002, -0.001, -0.001, -0.003, -0.002])

After transforming ebm_first to onnx_model i did the following to imitate inference in production:

onnx_model.ir_version = 9
ebm_onnx = rt.InferenceSession(onnx_model.SerializeToString())
pred_onnx = ebm_onnx.run(None, X_test.to_dict("list"))

# contributions of pred_onnx 
pred_onnx[2][0][:, 0]

result is

array([ 0.027,  0.416, -0.158,  0.388,  0.   ,  0.   , -0.196,  0.051,
       -0.201, -0.032,  0.176,  0.151,  0.   ,  0.216,  0.2  ,  0.376,
        0.05 ,  0.022, -0.076,  0.028, -0.26 , -0.043,  0.173,  0.269,
       -0.203, -0.025,  0.037, -0.056,  0.164,  0.296,  0.089,  0.08 ,
        0.1  ,  0.098, -0.018, -0.002, -0.001, -0.001, -0.003, -0.002],
      dtype=float32)

The two arrays are not equal in index 4 and 5; the only nominal features of the dataset.

Is it possible for you to publish here a model and sample utterance that reproduces the issue?
In the meantime I will look at reproducing this in a unit test

Could you confirm that the nominal features are of type boolean?
If this is the case, then can you try to explicitly convert them to 0/1 before calling ebm_first.explain_local:

X_test['feature'] = np.where(X_test['feature'] == False, 0, 1)

I suspect an issue in the interpret explain_local implementation. It looks like the boolean features are not correctly mapped, and have scores of 0.0.

ok forget my last comment, the problem is that the conversion to onnx mutates the ebm model object.
if you call ebm_first.explain_local before converting to onnx you will have the same values.

obviously, this is not a normal behavior of the converter. I will fix this.