peft model inference so slow!!

Question

peft model inference so slow!!

KLGR123 opened this issue 9 months ago · comments

As shown, I tried set `load_in_8bit=False` or set `model = model.merge_and_unload()`, but neither work. I mean it can output result like in 2000 years later SO is there a solution yet??