4 bits quantization of LLaMA using GPTQ
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
JianbangZ opened this issue a year ago · comments
for base FP16 model --eval gives 5.68 PPL on wikitext2 --benchmark 2048 gives 6.43 on wikitext2
What's the difference?