Does it support the new GGMLv3 quantization methods?

Question

Does it support the new GGMLv3 quantization methods?

Exotik850 opened this issue a year ago · comments

Tried using the cli application to see how far it had come from being llama-rs, and noticed that an error popped up using one of the newer WizardLM uncensored models using the GGMLv3 method,

llm llama chat --model-path .\Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_1.bin
⣾ Loading model...Error:
   0: Could not load model
   1: invalid file format version 3

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

Am I using it the wrong way or is it not supported yet?

Philpax · Answer 1 · Tue May 30 2023 03:32:34 GMT+0800 (China Standard Time)

Hi there! Yes, it's supported, but only on the latest version (main) - we haven't cut a new release yet. Hope to have that sorted soon!

Exotik850 · Answer 2 · Tue May 30 2023 23:32:00 GMT+0800 (China Standard Time)

My apologies, should've tried the main branch instead of just trying the release 😅

Philpax · Answer 3 · Thu Jun 01 2023 02:57:54 GMT+0800 (China Standard Time)

No worries - I'll keep this up for now and pin it for people's reference until we get it out the door :)

Sam Brew · Answer 4 · Sat Aug 19 2023 09:32:10 GMT+0800 (China Standard Time)

@philpax have you considered making some 0.2.0-beta.1 etc. releases on crates.io? This pattern has worked very well for some of my own projects in the past.

Philpax · Answer 5 · Mon Aug 21 2023 15:57:28 GMT+0800 (China Standard Time)

Hi there! Yeah, I've considered it, but the main blocker is #221 - I don't want to cut a release where the interface is going to be radically different in the next release. I'm hoping to have this all closed out within the next week or two, especially with GGUF on the horizon, but I've been quite busy.