rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models

Home Page:https://docs.rs/llm/latest/llm/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issues using with whisper-rs

jafioti opened this issue · comments

Hi, I'm trying to use llm on the same project where I'm already using whisper-rs (https://github.com/tazz4843/whisper-rs) and the ggml's for each of the projects seem to be interfering with eachother. Could it be because of the crates looking for the same files, and cargo folding them into the same dependency?

For instance, when I load up a model in llm, I get this error: thread 'main' panicked at 'called Result::unwrap()on anErr value: InvariantBroken { path: Some("./models/llama-2-7b-chat.ggmlv3.q4_0.bin"), invariant: "226001103 <= 2" }

When I remove whisper-rs from my project, it compiles and runs fine.

Any ideas how to resolve this? I'd assume I can just rename one of the sys crates, but it doesn't seem to be helping.

Yeah, that's unfortunately a little gnarly because both llm and whisper-rs use GGML - which is a C library with no function name mangling - so the linker has to pick one of the two conflicting implementations (and I believe whisper's is much older). Honestly, I'm surprised it compiled at all!

I would quite like to see an implementation of whisper in Rust, but it would require someone with more free time than me to do it.

Depending on how badly you need it, you could fork whisper-rs and whisper.cpp and rename things so that there are no conflicts, but that's obviously not ideal. For a short-term hacky fix, I'd suggest just breaking out the whisper-rs code into a separate application or dynamic library to ensure that the linker doesn't see both GGML implementations 😦

Candle has a completely rust native whisper example, which runs relatively fast. It doesn't support GGML models yet, but that's currently being worked on.

Of course! Do you know if they have any plans to break out the examples into their own libraries?

Well, i actually don't know. I'm currently only focusing on helping a bit with the quantization support. But i guess they wouldn't be unwilling to split it into libraries.

Happy to report that the candle whisper demo works great! Certainly slower than ggml, but still reasonably fast. I'll close this out since it's not really an issue with this crate in particular.

@jafioti Theoretically candle should support quantized ggml tensors since yesterday meaning you probably can recreate the wisper.cpp with candle as a backend and should get basically the same performance. Currently only q4_0 is supported but i'm planning to port most of the quantization formats over.

@jafioti Theoretically candle should support quantized ggml tensors since yesterday meaning you probably can recreate the wisper.cpp with candle as a backend and should get basically the same performance. Currently only q4_0 is supported but i'm planning to port most of the quantization formats over.

Is there an example of using the 4 bit quantization? I'm using candle's llama, but when I set the dtype to u8 I get not implemented errors

Is there an example of using the 4 bit quantization? I'm using candle's llama, but when I set the dtype to u8 I get not implemented errors

Take a look at the qantized llama example. Basically only the matmul opperation supports quantized tensors and will always produce a f32/f16 output meaning your weights are stored in the quantized format, but during inferenze you can use all candle operations as normal. You can create these QTensors either from a ggml file or from normal f32 tensors by quantizing them.