33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
ageorgios opened this issue 6 months ago · comments
Mac M1 Max 32GB user here without ability to bitsandbites quantize
Is there a way configure the chunk size for the inference to be quicker ? I think the 32GB memory is not efficiently used.