Help: Quantized llama-7b model with custom prompt format produces only gibberish

Question

Help: Quantized llama-7b model with custom prompt format produces only gibberish

Glavin001 opened this issue a year ago · comments

Could someone help me with how to quantize my own model with GPTQ-for-LLaMA?
See screenshot of the output I am getting 😢

Original full model: https://huggingface.co/Glavin001/startup-interviews-13b-int4-2epochs-1
Working quantized model with AutoGPT (screenshots): https://huggingface.co/Glavin001/startup-interviews-13b-2epochs-4bit-2
Dataset: https://huggingface.co/datasets/Glavin001/startup-interviews
Command I used in attempt to quantize: https://github.com/qwopqwop200/GPTQ-for-LLaMa

CUDA_VISIBLE_DEVICES=0 python3 llama.py /workspace/text-generation-webui/models/Glavin001_startup-interviews-13b-int4-2epochs-1/ c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors startup-interviews-llama7b-4bit-128g.safetensors

Quantized model (screenshots): Glavin001/startup-interviews-llama7b-v0.1-GPTQ ( https://huggingface.co/Glavin001/startup-interviews-llama7b-v0.1-GPTQ/tree/main )
Tested with/you can reproduce: TheBloke's Runpod template: https://github.com/TheBlokeAI/dockerLLM/
Model loader: Both AutoGPT & ExLlama look like gibberish/garbage output.
Example prompt:

<|prompt|>What is a MVP?</s><|answer|>

Possible problems:
I'm still learning about quantization. I notice there is a dataset field, set to c4 dataset. The dataset and prompt style for this model is different. I'm not sure how to customize this though, maybe I need custom Python script instead of using the llama.py CLI?

It took an hour or so to generate this so I'd like to get it right next time 😂

Any advice would be greatly appreciated! Thanks in advance!

Broken: GPTQ-for-LLaMa	Working: AutoGPTQ

CheshireAI · Answer 1 · Sun Jul 23 2023 17:20:23 GMT+0800 (China Standard Time)

Try without --act-order.