usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can we get FP4?

catid opened this issue · comments

FP6 doesn't seem to be a useful size. The best models are 70B that we can run, and only 4 bit models will fit in ~40-48GB VRAM

We will support FP5 soon. Yeah, I will try to also support FP4.