rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models

Home Page:https://docs.rs/llm/latest/llm/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WeightedIndex error: invalid weight

andri-jpg opened this issue · comments

When trying to run Pythia model using gptneox, I got this error, btw I use termux on Android with rust installed to run this model.

$ cargo run --release -- gptneox infer -m pythia-160m-q4_0.bin -p "Tell me how cool the Rust programming language is:" Finished release [optimized] target(s) in 2.18s
Running target/release/llm gptneox infer -m pythia-160m-q4_0.bin -p 'Tell me how cool the Rust programming language is:'
✓ Loaded 148 tensors (92.2 MB) after 293ms
<|padding|>Tell me how cool the Rust programming language is:The application panicked (crashed).
Message: WeightedIndex error: InvalidWeight
Location: crates/llm-base/src/samplers.rs:157

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.
Run with RUST_BACKTRACE=full to include source snippets.

Interesting, can you link the exact model you used, and the model of phone you have? I suspect this is more likely an issue with the execution on the phone (which will be a more complicated issue to diagnose), but we should rule out any issues with the model on a PC first.

Interesting, can you link the exact model you used, and the model of phone you have? I suspect this is more likely an issue with the execution on the phone (which will be a more complicated issue to diagnose), but we should rule out any issues with the model on a PC first.

Here is the link to model https://huggingface.co/rustformers/pythia-ggml/blob/main/pythia-160m-q4_0.bin

FYI this model run ok on my PC.
But I find bloom and llama run smoothly on my phone
My device is : Poco m3 pro 5g with 4gb ram
Screenshot_2023-06-19-16-02-23-492_com termux-edit

Ok, I've done some more testing - this model "works" (produces a lot of garbage) on x86-64 Windows, but doesn't work on macOS ARM64.

I think this is an ARM64 issue, or at least it's more obviously broken on ARM64. We'll need to test with upstream GGML GPT-NeoX support to see if this is an issue with GGML or with our implementation.

Yeah, I think so too. Maybe only some models can run on ARM64 architecture. llama.cpp (officially supported on Android according to the documentation), alpaca, or vicuna should work fine on Android. When I saw the availability of GPT-J in Rustformers, I became interested in performing inference with Rust on Android. Previously, llama.cpp only supported large models. I haven't tested the GPT-J family models yet because at that time they could only be run using the Transformers Python library, which requires Torch. Please note that Torch cannot be installed on Termux, and the same applies to NumPy.

I got this error a few times while implementing Metal support (#311) and it happened there when a graph was not fully computed or otherwise misconfigures (leading to garbage output). This was also on arm64 (M1). So either something up with graph construction or some ARM64 specific race condition?

Edit: could also just be the context running out of memory, or will that always lead to an error?

I got this error a few times while implementing Metal support (#311) and it happened there when a graph was not fully computed or otherwise misconfigures (leading to garbage output). This was also on arm64 (M1). So either something up with graph construction or some ARM64 specific race condition?

Edit: could also just be the context running out of memory, or will that always lead to an error?

btw, what model do you use on arm64? What rustformers models are supported on arm besides llama and bloom?