lleaves costs too much memory

Question

lleaves costs too much memory

111qqz opened this issue 2 years ago · comments

Thanks for your great work.
I want to try it out, but the memory consumption is staggering. I can't use it on a machine with 32G memory, out of memory will occur
Is there something I haven't set right?

import lleaves
import pdb

MODEL_TXT_PATH = '/home/kk/models/lgb080401_amp.txt'
llvm_model = lleaves.Model(model_file=MODEL_TXT_PATH)
# num = llvm_model.num_trees()
# pdb.set_trace()
llvm_model.compile(cache='./lleaves.bin', fblocksize=34)

Simon Boehm · Answer 1 · Tue Oct 11 2022 17:57:10 GMT+0800 (China Standard Time)

Interesting, not a problem that I have encountered before. Your settings look fine. Some ideas / questions:

The OOM problem occurs during compilation, correct? Could you try setting finline=False, that shouldn't really change anything but compilation will finish much faster.
How big is the forest that you're trying to compile? The biggest ones I've been using myself have been ~1000 trees, but others have used lleaves on even bigger models. Can you send me the model.txt, or is it private?
On which function / line does the problem actually occur? The in-memory structures that I used for loading the tree are not very memory-efficient, but your tree would have to be extremely large for that to become a problem. I'd assume the problem occurs in the llvm backend.

111qqz · Answer 2 · Wed Oct 12 2022 08:47:28 GMT+0800 (China Standard Time)

@siboehm 1. Yes. OOM during compilation. I have tried setting finline=False, and the compilation can end normally without OOM.
2. my model has exactly 1000 trees,the txt file is about 30M. This is indeed a large model, sorry I can't share this model
3. I don't know which function the OOM occurs in. Is there any way to get a more detailed log?

Simon Boehm · Answer 3 · Thu Oct 13 2022 22:11:34 GMT+0800 (China Standard Time)

I'm still thinking about this but here are my current conclusions:

If compilation runs through if you disable inlining, this means that the OOM probably occurs in the LLVM backend. Why might this happen? With finline=True, lleaves compiles the whole forest into a single function. This would be roughly like writing a single C function that is 30MB in filesize, which is obviously enormous. The optimization passes in the compiler don't expect the functions to be this large, and some passes may have mem requirements that are non-linear in function size, which would explain your OOM problems. Disabling inlining via fininline=False solves this problem as lleaves now generates 1000 functions (one for each tree in the forest). Disabling inlining does slow down prediction quite a bit if the trees are small (yours aren't) due to function call overhead and smaller optimization scope, but you should do your own benchmarking as this may not be a big problem in your case.
Your model is actually just super large. The largest current model in my test zoo is the mtpl2 model w/ 1000 trees, but the model.txt is only 3MB (so your trees are much deeper presumably, which explains the size difference).
Yes, it may be possible to get the python backtrace at the time of OOM error by running your compile script with gdb's Python plugin (not sure, I haven't done a lot of memory error debugging). You can also just use the regular Python debugger to validate that the error when running at this Python line, which is where the LLVM backend generates the assembly from the LLVM IR generated by lleaves.

So my advice would be:

Benchmark and see if running lleaves without inlining is fast enough for your usecase. I think in your usecase the difference with vs. without inlining should not be large anyway.
Rent a bigger machine with more RAM from AWS (lol). I've done this a lot in the past, it's not that expensive to get 256GB of RAM for an hour or so. Actually upon thinking about it again this may be hard due to #27. But it'd allow you to judge how much performance you're missing out on by not having inlining.
Manually split your model into smaller parts by editing the model.txt, and compile the parts individually through lleaves, stiching resulting binaries together with a bit of C-code. This is totally possible, but requires some manual work and it may not be that easy, depending on how familiar you are with the internal workings of your decision tree (like the prediction function applied to the end results etc). I assume if I did this it'd take my a few hours or so to pull off, but that requires being familiar with the structure of the model.txt, so I don't know if it is an option for you.

If it is true that the OOM error occurs on the line that I mentioned in Conclusion (3), then I'd consider this an edgecase of lleaves usage (your model.txt is enormous) and not something I'll mitigate in the near future.

111qqz · Answer 4 · Mon Oct 17 2022 09:52:50 GMT+0800 (China Standard Time)

@siboehm
Thank you very much for your detailed suggestions, I will try what you said later and update the progress here

Simon Boehm · Answer 5 · Fri Dec 23 2022 16:36:38 GMT+0800 (China Standard Time)

Just a note for myself: The way to fix this problem would be to split the model internally into functions (blocks of trees iteration over data), to ensure that every call to lleaves contains exactly N function calls. Then I can disable inlining for these N functions, meaning they're compiled independently (=much less memory demand than one large function), and the cost is still reasonable (a few function calls per predict invocation is super cheap).

James Pinkerton · Answer 6 · Fri Feb 09 2024 02:32:42 GMT+0800 (China Standard Time)

I have similar memory issues. I'm compiling forests with 500 trees and 250 leaves per tree on a 32GB machine. The compilation time is way too long and I run out of memory.

Setting finline=False worked for me. But it made things so slow that it was actually slower than the LGBM out of the box inference.
Your idea to batch is a great idea. I was about to open an issue suggesting this and then I read this thread :).
Is there a way to have n_jobs for compiling? If you don't inline or if you batch the inlining then you're describing different independent compilations. I have machines with 100-200 cores, but lleaves can only use one of them at a time.