nok / sklearn-porter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Prediction for ExtraTree model differs from sklearn (tested for C model)

LambertAn opened this issue · comments

I was trying to implement the predict_proba function for an Extra Tree model when I realized that the result returned by the transpiled version of the model differed from the one returned by sklearn.

My model contains 30 trees and 3 classes, below are the classes predicted by sklearn along side the probabilities for each estimator:

  Proba Class 0 Proba Class 1 Proba Class 2 Predicted class
Estimator 0 0.1765 0.0000 0.8235 2
Estimator 1 0.0000 0.0000 1.0000 2
Estimator 2 0.1667 0.0000 0.8333 2
Estimator 3 0.6923 0.0000 0.3077 0
Estimator 4 0.8125 0.0417 0.1458 0
Estimator 5 0.8374 0.0064 0.1562 0
Estimator 6 0.9727 0.0000 0.0273 0
Estimator 7 0.3429 0.0000 0.6571 2
Estimator 8 0.8391 0.0095 0.1514 0
Estimator 9 0.0000 0.0000 1.0000 2
Estimator 10 0.7266 0.0078 0.2656 0
Estimator 11 0.6220 0.0000 0.3780 0
Estimator 12 0.5000 0.0000 0.5000 0
Estimator 13 0.6117 0.0000 0.3883 0
Estimator 14 0.0000 0.0000 1.0000 2
Estimator 15 0.8687 0.0000 0.1313 0
Estimator 16 1.0000 0.0000 0.0000 0
Estimator 17 0.8468 0.0170 0.1362 0
Estimator 18 0.5595 0.0000 0.4405 0
Estimator 19 0.0714 0.0000 0.9286 2
Estimator 20 0.4600 0.0000 0.5400 2
Estimator 21 0.0000 0.0000 1.0000 2
Estimator 22 0.5217 0.0000 0.4783 0
Estimator 23 0.8322 0.0049 0.1629 0
Estimator 24 0.5000 0.0000 0.5000 0
Estimator 25 0.3333 0.0000 0.6667 2
Estimator 26 1.0000 0.0000 0.0000 0
Estimator 27 0.4545 0.0000 0.5455 2
Estimator 28 0.0000 0.0000 1.0000 2
Estimator 29 0.0000 0.0000 1.0000 2
MODEL 0.4916 0.0029 0.5055 2

17 estimators predict class 0 and 13 predict class 2 BUT the model predicts class 2 because it is the most probable class.

Therefore it seems to me that the transpiled model should also make its decision on the predicted probabilities.

What do you think?

Hello @LambertAn, thanks for your detailed report. Can you provide some data to reproduce the behaviour? And did you run the integrity check with integrity_score? What score did you get?

Thanks for getting back to me.

Below is code to build a 3-class extra tree classifier on random data.

from sklearn_porter import Porter
from sklearn.ensemble import ExtraTreesClassifier
import numpy as np

# Build random dataset
prng = np.random.RandomState(123)
X = prng.rand(50, 10)
y = prng.randint(0, 3, 50)

# Fit model
model = ExtraTreesClassifier(n_estimators=3, max_depth=3, random_state=prng)
model.fit(X, y)

# export:
porter = Porter(model, language='c')
output = porter.export(embed_data=True)
with open('extratree_randomdataset_original.c', 'w') as f_out:
    f_out.write(output)

# accuracy:
integrity = porter.integrity_score(X)
print(integrity)

# Show details for one point
test_point = X[2:3]
for i in range(0, len(model.estimators_)):
    print ("{}: {} -> {}".format(i, model.estimators_[i].predict_proba(test_point), model.estimators_[i].predict(test_point)))
print (model.predict_proba(test_point))
print (model.predict(test_point))

print (test_point)

The integrity score on the training data is 0.86. Let's look at the result for one of the data point: each estimator predicts a different class:

Estimator 0 predicts class 0 with probabilities [0.45 0.20 0.35]
Estimator 1 predicts class 2 with probabilities [0.17 0.08 0.75]
Estimator 2 predicts class 1 with probabilities [0.24 0.52 0.24]

The model predicts class 2 with probabilities [0.29 0.27 0.45].

I attached the above python code and 2 C files (the original model as generated by sklearn-porter and a modified version that calculates the probabilities for each estimator as well as the average for the model prediction):

sklearn_porter_issue35.zip

For the above point the original 'predict' method returns class 0 and the new model 'predict_proba method returns: [0.29 0.27 0.45].

I hope it is enough to reproduce the problem.

Hello @LambertAn, we found a small bug and fixed it (release/0.7.0: Merge branch 'master' into release/0.7.0). Can you please reinstall the package and test it again?

pip uninstall -y sklearn-porter
pip install --no-cache-dir https://github.com/nok/sklearn-porter/zipball/master

Hi, I finally had some time to test but unfortunately this problem was not fixed. I used the python script above and had exactly the same results as before with an integrity score of 0.86.

I belive this is the same issue as #52