How to quantize a finetuned own dataset SeaLLM v2.5 using llamaCPP

Question

How to quantize a finetuned own dataset SeaLLM v2.5 using llamaCPP

nvip12041994 opened this issue 2 months ago · comments

nvip12041994 commented 2 months ago

After fine-tuning SeaLLM v2.5 according to the instructions, I used the following commands:

python llama.cpp/convert.py SeaLLM-7B-v2.5/ --outtype f16 --outfile SeaLLM-7B-v2.5.fp16.bin ./llama.cpp/build/bin/quantize SeaLLM-7B-v2.5.fp16.bin SeaLLM-7B-v2.5.q4km.gguf
To quantize the model and use it locally with LLM Studio, it cannot be used because during inference I encounter the following error:

m_load_tensors: ggml ctx size = 0.13 MiB llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_q.weight' has wrong shape; expected 3072, 3072, got 3072, 4096, 1, 1 llama_load_model_from_file: failed to load model llama_init_from_gpt_params: error: failed to load model 'quantize_models/SeaLLM-7B-v2.5.q4km.gguf' {"tid":"137551680389120","timestamp":1715238997,"level":"ERR","function":"load_model","line":685,"msg":"unable to load model","model":"'quantize_models/SeaLLM-7B-v2.5.q4km.gguf"} ```"

Please help me