bug: Phi-2 support is not exposed via the API Server CLI

Question

bug: Phi-2 support is not exposed via the API Server CLI

ChristianWeyer opened this issue 4 months ago · comments

Summary

Phi-2 prompt template is implemented internally:
https://github.com/second-state/LlamaEdge/blob/6eed9d5b25133e623f643e212c4a672bd2c769e6/api-server/chat-prompts/src/lib.rs#L55

But it is not exposed through the CLI:
https://github.com/second-state/LlamaEdge/blob/6eed9d5b25133e623f643e212c4a672bd2c769e6/api-server/llama-api-server/src/main.rs#L166

Therefore we get an error when trying to run Phi-2:

wasmedge --dir .:. --nn-preload default:GGML:AUTO:phi-2-Q6_K.gguf llama-api-server.wasm --prompt-template phi-2-instruct --model-name phi-2 --socket-addr 127.0.0.1:8080 --log-prompts --log-stat
error: invalid value 'phi-2-instruct' for '--prompt-template <TEMPLATE>'
  [possible values: llama-2-chat, codellama-instruct, codellama-super-instruct, mistral-instruct-v0.1, mistral-instruct, mistrallite, openchat, human-assistant, vicuna-1.0-chat, vicuna-1.1-chat, chatml, baichuan-2, wizard-coder, zephyr, stablelm-zephyr, intel-neural, deepseek-chat, deepseek-coder, solar-instruct]

  tip: a similar value exists: 'mistral-instruct'

For more information, try '--help'.

Reproduction steps

wasmedge --dir .:. --nn-preload default:GGML:AUTO:phi-2-Q6_K.gguf llama-api-server.wasm --prompt-template phi-2-instruct --model-name phi-2 --socket-addr 127.0.0.1:8080 --log-prompts --log-stat

Screenshots

Any logs you want to share for showing the specific issue

No response

Model Information

phi-2-Q6_K.gguf

Operating system information

macOS 14.2.1

ARCH

arm64

CPU Information

Apple M1 Max

Memory Size

64GB

GPU Information

Apple M1 Max

VRAM Size

Apple M1 Max

Xin Liu · Answer 1 · Wed Feb 07 2024 15:24:12 GMT+0800 (China Standard Time)

@ChristianWeyer Thanks for your report. As you found, we implemented phi-2-chat prompt type. And also, we used it to evaluate Phi-2. According to our observations on the chat with Phi-2, we found that, different from the normal chat models that usually have two roles: User and Assistant, Phi-2 has multiple roles: Alice, Bob, Charlie, ..., and of course User. For now, we cannot support such a chat model with multiple roles. This is why we do not expose the phi-2-chat prompt type to our users. Thanks a lot!

Christian Weyer · Answer 2 · Wed Feb 07 2024 15:30:44 GMT+0800 (China Standard Time)

Ah, got it - thanks @apepkuss. Basically, I was confused by the release notes then, which states that Phi-2 is supported.

Xin Liu · Answer 3 · Wed Feb 07 2024 15:32:54 GMT+0800 (China Standard Time)

You can try the instruct mode with the phi-2-instruct prompt type if you would like to try Phi-2.

Christian Weyer · Answer 4 · Wed Feb 07 2024 16:37:27 GMT+0800 (China Standard Time)

As you can see above in my report, this is what I am trying to do... :-)

Xin Liu · Answer 5 · Wed Feb 07 2024 17:28:59 GMT+0800 (China Standard Time)

@ChristianWeyer You have to use llama-chat.wasm if you'd like to run phi-2 with the phi-2-instruct prompt type. You can refer to the command mentioned in second-state/phi-2-GGUF to run it.

Christian Weyer · Answer 6 · Wed Feb 07 2024 17:36:36 GMT+0800 (China Standard Time)

OK, got that. However, I need an OpenAI API-compatible endpoint for my use cases.

Xin Liu · Answer 7 · Wed Feb 07 2024 17:40:25 GMT+0800 (China Standard Time)

Ok, I see. I'll check the possibility of supporting it in api-server. If it works, I'll feedback to you here.

Christian Weyer · Answer 8 · Fri Mar 29 2024 16:15:44 GMT+0800 (China Standard Time)

Ok, I see. I'll check the possibility of supporting it in api-server. If it works, I'll feedback to you here.

What is the current state of this issue? Thanks!

Christian Weyer · Answer 9 · Mon May 20 2024 05:54:36 GMT+0800 (China Standard Time)

Ok, I see. I'll check the possibility of supporting it in api-server. If it works, I'll feedback to you here.

Has this been implemented in the meantime?
Thanks!