lyogavin / Anima

33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

configure the chunk split size

ageorgios opened this issue · comments

Mac M1 Max 32GB user here without ability to bitsandbites quantize

Is there a way configure the chunk size for the inference to be quicker ? I think the 32GB memory is not efficiently used.