nok / sklearn-porter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

C code generator with multi output RFC - illegal code generated and general failure to handle multi dimension output

mg169706 opened this issue · comments

I'm creating a Random Forest Classifier that features 248 inputs and 108 outputs. Based on the Boolean state of each input the 108 outputs will be on or off (They represent valves). The value of these discreet output states is what the system has learned. There are two issues I'm having with this:

  1. The code generator only seems to create trees for one output, and I don't know which one. For each output I'd expect a separate set of trees, because the inputs remain the same, but the decision tree for each valve's state will be different.

  2. The code for the single output generates invalid C. See below for example code fragment.

    `int predict_0(float features[]) {
    int classes[[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]];

    if (features[181] <= 0.5) { ... }
    }`

commented

Any Updates on this issue? I have the same problem.

commented

Ok, I found a workaround of sorts. You can use sklearn.multioutput.MultiOutputClassifier to create a Classifier for each output, then export each .estimator_ of the multi-output classifier as a separate classifier. It does mean you have to modify the C code a little bit as you now have multiple separate classifiers.

Ok, I found a workaround of sorts. You can use sklearn.multioutput.MultiOutputClassifier to create a Classifier for each output, then export each .estimator_ of the multi-output classifier as a separate classifier. It does mean you have to modify the C code a little bit as you now have multiple separate classifiers.

I am afraid it is not a good idea. 108 outputs will take a lot of labor.