ggerganov / ggml

Tensor library for machine learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Merge HF LoRa adapter with a quantized GPT-J model using ggml

webpolis opened this issue · comments

Hello!

I have fine-tuned a GPT-J base model (loaded in 4 bits) using HF + LoRa. I quantized the same base model using ggml to q4_0, and it loads perfectly fine using the built examples/gpt-j binaries. Since it's not yet possible to save 4-bit models together with the adapters using HF's 4-bit loaded model, I must find a different way to accomplish this.

I want to "merge" the LoRa adapters (convert to ggml first?) with this q4_0 version so I can perform inference on the CPU.

Any hints?