siboehm / lleaves

Compiler for LightGBM gradient-boosted trees, based on LLVM. Speeds up prediction by ≥10x.

Home Page:https://lleaves.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

saving models persistently

nepslor opened this issue · comments

First of all, thank you for your impressive work.
I wanted to ask if there is a way to store the compiled models in a persistent way. In my case the predictor is composed by ~100 LightGBM models, so that the compiling of the full predictor is highly time consuming. When I tried to pickle the compiled lleaves model, I got:

ValueError: ctypes objects containing pointers cannot be pickled

for which I guess there is no easy workaround. Do you know if it possible to avoid re-compilation of the original LightGBM instances?
Thank you

If you cache the result of the compilation then you won't have to recompile. Example:

from lleaves import Model
model_file = "tests/models/NYC_taxi/model.txt"
m = Model(model_file=model_file)
m.compile(cache="NYC_taxi.o")

During the first run lleaves will compile the model and save it in a file called NYC_taxi.o. If you run the same code again lleaves will see that the file already exists and just load the compiled model without having to compile it again.

So instead of pickling, just save the model.txt and the NYC_taxi.o. Does that solve your usecase?

I should probably document this better. I can look into making lleaves.Model pickle-able, but intuitively I feel like this'll just hide the fact that there's an underlying compiled object file (NYC_taxi.o) which you can't just move around between different CPUs / Operating Systems. Maybe lleaves.Model should be pickle-able without saving the object-file, so at least you won't need to store the model.txt? Feel free to let me know if you have opinions on this.

I don't know about other users, but we never save the booster to a text file. We tried creating a temporary file specifically for lleaves and storing the compiled model as a fitted attribute of the LGBM regressor, but we can't save the resulting object to MLflow (because we can't pickle it), at which point it isn't worth the bother.

@siboehm thanks! This is perfect and completely solves my usecase

@nepslor I'm glad! closing