LlamaEdge / LlamaEdge

The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge

Home Page:https://llamaedge.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug: Can‘t interact Samantha-1.2-Mistral-7B with API server

alabulei1 opened this issue · comments

Summary

Can't interact with Samantha-1.2-Mistral-7Bt with API server.

Device: mac M2

shasum -a 256 llama-api-server.wasm
8ea7d3c1fe723f83038c327ae601c999b0fe4f6d0f19a61b2fd59acb39af2d70  llama-api-server.wasm
shasum -a 256 ggml-metal.metal
f8ac2d9ddc60232f6836a2730828d739a8892c2fa829bdf484c560f5f7fba655  ggml-metal.metal
shasum -a 256 libwasmedgePluginWasiNN.dylib
8fb6908818d9daf88ad3aa8e5fdf77635a56a157f755a6dcd75f6731e64d3cad  libwasmedgePluginWasiNN.dylib

Reproduction steps

  1. Run the run-llm.sh
  2. Choose 18) Samantha-1.2-Mistral-7B
  3. choose run with API server
  4. Open localhost:8080, and ask a question "Where is beijing?", the error message will be the following:
Starting llama-api-server ...

+ '[' -n '<|im_end|>' ']'
+ wasmedge --dir .:. --nn-preload default:GGML:AUTO:samantha-1.2-mistral-7b-ggml-model-q4_0.gguf llama-api-server.wasm -p chatml -m Samantha-1.2-Mistral-7B -r '<|im_end|>'
[INFO] Socket address: 0.0.0.0:8080
[INFO] Model name: Samantha-1.2-Mistral-7B
[INFO] Model alias: default
[INFO] Prompt context size: 4096
[INFO] Number of tokens to predict: 1024
[INFO] Number of layers to run on the GPU: 100
[INFO] Batch size for prompt processing: 4096
[INFO] Reverse prompt: <|im_end|>
[INFO] Prompt template: ChatML
[INFO] Log prompts: false
[INFO] Log statistics: false
[INFO] Log all information: false
[INFO] Starting server ...
[INFO] Listening on http://0.0.0.0:8080
GGML_ASSERT: /Users/hydai/workspace/WasmEdge/plugins/wasi_nn/thirdparty/ggml/ggml-metal.m:1459: false
/dev/fd/11: line 356:   974 Abort trap: 6           wasmedge --dir .:. --nn-preload default:GGML:AUTO:$model_file llama-api-server.wasm -p $prompt_template -m "${model}" -r "${reverse_prompt[@]}"
+ set +x

Screenshots

image

Any logs you want to share for showing the specific issue

No response

Model Information

samantha-1.2-mistral-7b-ggml-model-q4_0.gguf

Operating system information

macOS 14.1.1 (23B81)

ARCH

M2

CPU Information

M2

Memory Size

16GB

GPU Information

M2

VRAM Size

I don't know

#52 solved my problem.
image