Predictions differ

Question

Predictions differ

Hoeze opened this issue a year ago · comments

Florian R. Hölzlwimmer commented a year ago

ebm2onnx version: 1.3.0
Python version: 3.8
Operating System: Arch linux

Description

I would like to convert an interpret v0.2.7 model to ONNX for conserving it for the future.
However, the predictions that I get strongly differ from the original model:

>>> original_result
array([[0.98610258, 0.01389742],
       [0.99524099, 0.00475901],
       [0.99398961, 0.00601039],
       [0.99739259, 0.00260741]])
>>> onnx_result
[array([[0.9861026 , 0.01389742],
       [0.96940696, 0.03059301],
       [0.9602685 , 0.03973148],
       [0.98411477, 0.01588521]], dtype=float32)]

What I Did

My conversion script:

#!/usr/bin/env python3
# requires "interpret==0.2.7" "interpret_core==0.2.7" "ebm2onnx==1.3"
import onnx
import onnxruntime
import ebm2onnx

import pickle
import json

import numpy as np
import pandas as pd

with open("AbSplice_DNA.pkl", "rb") as fd:
    absplice_dna_model = pickle.load(fd)

print(json.dumps(dict(zip(absplice_dna_model.feature_names, absplice_dna_model.feature_types)), indent=2))

test_df = pd.read_parquet("test.parquet")

onnx_model = ebm2onnx.to_onnx(
    absplice_dna_model,
    ebm2onnx.get_dtype_from_pandas(test_df),
    predict_proba=True
)
onnx.save_model(onnx_model, 'ebm_model.onnx')
session = onnxruntime.InferenceSession('ebm_model.onnx')

original_result = absplice_dna_model.predict_proba(test_df)
print(original_result)
onnx_result = session.run(None, {k: np.asarray(v) for k, v in test_df.items()})
print(onnx_result)

Further, you can find all necessary files to reproduce my issue in the attached zip file:
onnx_test.zip

Any help would be highly appreciated!

Romain Picard · Answer 1 · Wed May 31 2023 23:34:12 GMT+0800 (China Standard Time)

~~did you mean 3.1.0 instead of 1.3.0 for the ebm2onnx version?~~
ok I see in the script that it is indeed 1.3.0.

Do you have the result of "ebm2onnx.get_dtype_from_pandas(test_df)" or can you share the test set?

Florian R. Hölzlwimmer · Answer 2 · Wed May 31 2023 23:56:03 GMT+0800 (China Standard Time)

Hi @MainRo, thanks for looking into it!
Yes, please check above, the reproducible example is in the onnx_test.zip file

Romain Picard · Answer 3 · Tue Jun 27 2023 16:08:02 GMT+0800 (China Standard Time)

I started to analyze the issue but did not find yet where it comes from. The scores associated with each term do not seem correct in the converted model.
Can you try to:

retrain with the same environment but disable the interactions
retrain with the latest version of interpretml (0.4.2) and ebm2onnx (3.1.1)

Romain Picard · Answer 4 · Tue Jun 27 2023 16:50:32 GMT+0800 (China Standard Time)

@Hoeze forget my previous comment.

Can you check the type of the splice_site_is_expressed column when training the model? Especially, check that it is declared as an int and not a float.

This is a categorical column and the values in the dataframe are 0 or 1. The type of the column in the parquet file is integer, but I see that internally, ebm considers them as floats before doing the categorical encoding.

The difference comes from this feature, and it is probable that it is because at some point it is converted to a float.

When I change the type of this column to string and update the internal types of the ebm model, I have similar values between interpret and onnx.

Florian R. Hölzlwimmer · Answer 5 · Thu Aug 10 2023 04:22:31 GMT+0800 (China Standard Time)

Thanks a lot @MainRo, now this makes a lot of sense.
"splice_site_is_expressed" in the EBM gets converted from int -> float -> string -> int...
E.g. splice_site_is_expressed == 1 (int) -> 1.0 (float) -> "1.0" (string) -> 0 (int) 🤦

I manually fixed this in the onnx models using onnx-modifier.