saving models persistently

Question

saving models persistently

nepslor opened this issue 3 years ago · comments

First of all, thank you for your impressive work.
I wanted to ask if there is a way to store the compiled models in a persistent way. In my case the predictor is composed by ~100 LightGBM models, so that the compiling of the full predictor is highly time consuming. When I tried to pickle the compiled lleaves model, I got:

ValueError: ctypes objects containing pointers cannot be pickled

for which I guess there is no easy workaround. Do you know if it possible to avoid re-compilation of the original LightGBM instances?
Thank you

Simon Boehm · Answer 1 · Fri Oct 08 2021 23:03:36 GMT+0800 (China Standard Time)

If you cache the result of the compilation then you won't have to recompile. Example:

from lleaves import Model
model_file = "tests/models/NYC_taxi/model.txt"
m = Model(model_file=model_file)
m.compile(cache="NYC_taxi.o")

During the first run lleaves will compile the model and save it in a file called NYC_taxi.o. If you run the same code again lleaves will see that the file already exists and just load the compiled model without having to compile it again.

So instead of pickling, just save the model.txt and the NYC_taxi.o. Does that solve your usecase?

I should probably document this better. I can look into making lleaves.Model pickle-able, but intuitively I feel like this'll just hide the fact that there's an underlying compiled object file (NYC_taxi.o) which you can't just move around between different CPUs / Operating Systems. Maybe lleaves.Model should be pickle-able without saving the object-file, so at least you won't need to store the model.txt? Feel free to let me know if you have opinions on this.

Luca Bittarello · Answer 2 · Sat Oct 09 2021 00:23:35 GMT+0800 (China Standard Time)

I don't know about other users, but we never save the booster to a text file. We tried creating a temporary file specifically for lleaves and storing the compiled model as a fitted attribute of the LGBM regressor, but we can't save the resulting object to MLflow (because we can't pickle it), at which point it isn't worth the bother.

nepslor · Answer 3 · Sat Oct 09 2021 00:44:24 GMT+0800 (China Standard Time)

@siboehm thanks! This is perfect and completely solves my usecase

Simon Boehm · Answer 4 · Sat Oct 09 2021 12:57:45 GMT+0800 (China Standard Time)

@nepslor I'm glad! closing