lyogavin / Anima

33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CPU ram offload

NicolasMejiaPetit opened this issue · comments

How can we offload the weights to CPU ram as opposed to the disk?

many people have for example
32-128gb of ram
And a 3090

With the layers offloaded to cpu ram we would get full pcei speeds, and not be bottle necked by m.2 speeds.