Vahe1994 / SpQR

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Which dataset should I use?

ccccj opened this issue · comments

Hello, I have a question, I currently have a model of the llama series that has been fine-tuned with my own dataset. If I want to SpQR quantize it, do I use data/red_pajama_n=1024.pth for the parameter as well? Or do I use my own dataset that I used for fine-tuning?
Looking forward to getting your response!

Hello @ccccj ,
if you are focused on the best performance in some specific domain (presumably this is the reason for having your own dataset) - then you may get slightly better results using your own dataset for SpQR quantization. Just take a subset comparable in size to data/red_pajama_n=1024.pth.
red_pajama should also give decent results. If you can try both - please write back here with your quality measurements.