siboehm / lleaves

Compiler for LightGBM gradient-boosted trees, based on LLVM. Speeds up prediction by ≥10x.

Home Page:https://lleaves.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Specify C interface: Prediction array zeroed out or overwrite?

hdosuperuser opened this issue · comments

I've noticed unusual behavior while using compiled models inside C++.
At predefined input set of feature values at first call to forest_root I do get same results like in LGBM or LLEAVES in Python. But when I call same function (forest_root) just again using same values result is target * 2?

To clear some doubts, I matched LGBM Python and LLEAVES Python and I get expected (valid) results, so difference is only when I call in C++ so just calling forest_root twice with same params gives different results.

It So I guess you can try with any compiled model which you have prepared.

#include "c_bench.h"
#include <iostream>
#include <vector>
int main()
{
    std::vector<double> features {/*feature values*/};

    double prediction {0};
    forest_root(features.data(), &prediction, 0, 1); // Valid prediction results  
    forest_root(features.data(), &prediction, 0, 1); // 2 x previous result, invalid? Should set prediction to same value as in first

    std::cout << "Prediction: " << prediction << std::endl;
}

Here is another hint, when using different new prediction results variable.

    double prediction1 {0};
    double prediction2 {0};

    forest_root(features.data(), &prediction1, 0, 1); // correct
    forest_root(features.data(), &prediction2, 0, 1);` // correct 
    
    

It doesn't give different results, it's just that I assume the prediction array is zero'd and I += the results. In Python I create a new result array each time, hence that's why it works.

Thanks for quick reply, I was not aware of the fact and I could not track what is going on, since I have situation where I would basically go just change one feature value between two calls (i.e. direction kind of feature).

I see why you ran into this problem though, the C interface is not specified anywhere! I may have a look at this in the future, either specifying the API, or just overwriting the array. Thank you for raising this :) I'll adjust the title so other's fine it more easily.

Ok I created a fix that is also faster because get's rid of some load instructions

Closed by #47