LlamaEdge / LlamaEdge

The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge

Home Page:https://llamaedge.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Request: Output response of API server

conikeec opened this issue · comments

Summary

I would be preferable to have message conent in response represented as a markdown (formatable)

I was trying to use deepseek with API server and the response JSON had a message with embedded code which didn't seem readable

Request

curl -X POST http://localhost:8080/v1/chat/completions -H 'accept:application/json' -H 'Content-Type: application/json' -d '{"messages":[{"role":"system", "content": "You are a experienced rust engineer"}, {"role":"user", "content": "Write a async fn that accepts a string and reverses it"}], "model":"deepseek-coder"}' | jq .

Response

{
  "id": "b3d79a47-8169-4825-9794-af0582c0fe45",
  "object": "chat.completion",
  "created": 1707982054,
  "model": "deepseek-coder",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Sure, here is an example of how you can write an asynchronous function in Rust to reverse a string. This function uses the `tokio` crate for its asynchronicity. \n\n```rust\nuse tokio::fs;\n\n#[tokio::main]\nasync fn main() {\n    let str = \"Hello, world!\";\n    printlnfmt::println(\"Reversed string: {}\", reverse_string(str).await.unwrap()).unwrap();\n}\n\nasync fn reverse_string(s: &str) -> std::result::Result<String, Box<dyn std::error::Error>> {\n    let mut chars = s.chars();\n    let mut result = String::new();\n    \n    while let Some(c) = chars.next() {\n        fs::write(\"temp\", c).await?;\n        let read_content = std::fs::read_to_string(\"temp\").unwrap();\n        result.push_str(&read_content);\n    }\n    \n    Ok(result)\n}\n```\nPlease note that this is a simple example and may not be the most efficient way to reverse a string in Rust due to the asynchronous nature of the function. In real-world applications, you would likely use a more direct method to reverse a string without needing to write/read from a file. \n\nAlso, please note that this code will only work on Unix systems because it uses `fs::write` and `std::fs::read_to_string` which are specific to the filesystem of your system. If you want to use this function in Windows or any other OS, you would need to replace these functions with equivalent ones for that OS."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 386,
    "total_tokens": 414
  }
}

Appendix

No response

Hi @conikeec

Thank you for the suggestions. The goal of this api-server project is to be OpenAI-compatible so that OpenAI tools can directly switch to a private WasmEdge server. That means to conform to the OpenAI JSON format.

However, a key feature of LlamaEdge is that it is also a development platform. You should be able to modify the api-server Rust source code so that it simply returns the model response in the HTTP body as markdown instead of assembling it into a JSON string. Here is a simplified example that shows how to call the LLM inference function from the Rust code.

https://github.com/second-state/WasmEdge-WASINN-examples/tree/master/wasmedge-ggml/llama

Once you compile you new api server to Wasm, you should be able to deploy it to run on any device.

Do you want to give it a try? :)

hi Juntao,
I have tried to build this project ,but it doesn't work now

➜ llama git:(master) wasmedge --dir .:.
--nn-preload default:GGML:AUTO:llama-2-7b-chat.Q5_K_M.gguf
wasmedge-ggml-llama.wasm default
[2024-03-08 14:47:38.461] [error] instantiation failed: module name conflict, Code: 0x60
[2024-03-08 14:47:38.462] [error] At AST node: module
[2024-03-08 14:47:38.467] [error] [WASI-NN] GGML backend: Error: unable to init model.
thread 'main' panicked at 'Failed to build graph: BackendError(InvalidArgument)', src/main.rs:83:17
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
[2024-03-08 14:47:38.467] [error] execution failed: unreachable, Code: 0x89
[2024-03-08 14:47:38.467] [error] In instruction: unreachable (0x00) , Bytecode offset: 0x000143d7
[2024-03-08 14:47:38.467] [error] When executing function name: "_start"
➜ llama git:(master)

Do you have the llama-2-7b-chat.Q5_K_M.gguf file in your local directory? What is its size? (i.e., ls -al)

Also, I want to note that the wasmedge-ggml-llama.wasm is NOT the API server app.