LlamaEdge / LlamaEdge

The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge

Home Page:https://llamaedge.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug: Phi-2 support is not exposed via the API Server CLI

ChristianWeyer opened this issue · comments

Summary

Phi-2 prompt template is implemented internally:
https://github.com/second-state/LlamaEdge/blob/6eed9d5b25133e623f643e212c4a672bd2c769e6/api-server/chat-prompts/src/lib.rs#L55

But it is not exposed through the CLI:
https://github.com/second-state/LlamaEdge/blob/6eed9d5b25133e623f643e212c4a672bd2c769e6/api-server/llama-api-server/src/main.rs#L166

Therefore we get an error when trying to run Phi-2:

wasmedge --dir .:. --nn-preload default:GGML:AUTO:phi-2-Q6_K.gguf llama-api-server.wasm --prompt-template phi-2-instruct --model-name phi-2 --socket-addr 127.0.0.1:8080 --log-prompts --log-stat
error: invalid value 'phi-2-instruct' for '--prompt-template <TEMPLATE>'
  [possible values: llama-2-chat, codellama-instruct, codellama-super-instruct, mistral-instruct-v0.1, mistral-instruct, mistrallite, openchat, human-assistant, vicuna-1.0-chat, vicuna-1.1-chat, chatml, baichuan-2, wizard-coder, zephyr, stablelm-zephyr, intel-neural, deepseek-chat, deepseek-coder, solar-instruct]

  tip: a similar value exists: 'mistral-instruct'

For more information, try '--help'.

Reproduction steps

wasmedge --dir .:. --nn-preload default:GGML:AUTO:phi-2-Q6_K.gguf llama-api-server.wasm --prompt-template phi-2-instruct --model-name phi-2 --socket-addr 127.0.0.1:8080 --log-prompts --log-stat

Screenshots

Any logs you want to share for showing the specific issue

No response

Model Information

phi-2-Q6_K.gguf

Operating system information

macOS 14.2.1

ARCH

arm64

CPU Information

Apple M1 Max

Memory Size

64GB

GPU Information

Apple M1 Max

VRAM Size

Apple M1 Max

@ChristianWeyer Thanks for your report. As you found, we implemented phi-2-chat prompt type. And also, we used it to evaluate Phi-2. According to our observations on the chat with Phi-2, we found that, different from the normal chat models that usually have two roles: User and Assistant, Phi-2 has multiple roles: Alice, Bob, Charlie, ..., and of course User. For now, we cannot support such a chat model with multiple roles. This is why we do not expose the phi-2-chat prompt type to our users. Thanks a lot!

Ah, got it - thanks @apepkuss. Basically, I was confused by the release notes then, which states that Phi-2 is supported.

You can try the instruct mode with the phi-2-instruct prompt type if you would like to try Phi-2.

As you can see above in my report, this is what I am trying to do... :-)

@ChristianWeyer You have to use llama-chat.wasm if you'd like to run phi-2 with the phi-2-instruct prompt type. You can refer to the command mentioned in second-state/phi-2-GGUF to run it.

OK, got that. However, I need an OpenAI API-compatible endpoint for my use cases.

Ok, I see. I'll check the possibility of supporting it in api-server. If it works, I'll feedback to you here.

Ok, I see. I'll check the possibility of supporting it in api-server. If it works, I'll feedback to you here.

What is the current state of this issue? Thanks!

Ok, I see. I'll check the possibility of supporting it in api-server. If it works, I'll feedback to you here.

Has this been implemented in the meantime?
Thanks!