qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Help: Quantized llama-7b model with custom prompt format produces only gibberish

Glavin001 opened this issue · comments

Could someone help me with how to quantize my own model with GPTQ-for-LLaMA?
See screenshot of the output I am getting 😢

Original full model: https://huggingface.co/Glavin001/startup-interviews-13b-int4-2epochs-1
Working quantized model with AutoGPT (screenshots): https://huggingface.co/Glavin001/startup-interviews-13b-2epochs-4bit-2
Dataset: https://huggingface.co/datasets/Glavin001/startup-interviews
Command I used in attempt to quantize: https://github.com/qwopqwop200/GPTQ-for-LLaMa

CUDA_VISIBLE_DEVICES=0 python3 llama.py /workspace/text-generation-webui/models/Glavin001_startup-interviews-13b-int4-2epochs-1/ c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors startup-interviews-llama7b-4bit-128g.safetensors

Quantized model (screenshots): Glavin001/startup-interviews-llama7b-v0.1-GPTQ ( https://huggingface.co/Glavin001/startup-interviews-llama7b-v0.1-GPTQ/tree/main )
Tested with/you can reproduce: TheBloke's Runpod template: https://github.com/TheBlokeAI/dockerLLM/
Model loader: Both AutoGPT & ExLlama look like gibberish/garbage output.
Example prompt:

<|prompt|>What is a MVP?</s><|answer|>

Possible problems:
I'm still learning about quantization. I notice there is a dataset field, set to c4 dataset. The dataset and prompt style for this model is different. I'm not sure how to customize this though, maybe I need custom Python script instead of using the llama.py CLI?

It took an hour or so to generate this so I'd like to get it right next time 😂

Any advice would be greatly appreciated! Thanks in advance!

Broken: GPTQ-for-LLaMa Working: AutoGPTQ
imageimage image image

Try without --act-order.