longchat-13b-16k chat not work

Question

longchat-13b-16k chat not work

ahkimkoo opened this issue a year ago · comments

Cherokee commented a year ago

reply like this:

Dacheng Li · Answer 1 · Wed Jul 05 2023 23:08:08 GMT+0800 (China Standard Time)

@ahkimkoo it has not been trained in Chinese data, please use only English for now.

Cherokee · Answer 2 · Thu Jul 06 2023 12:00:01 GMT+0800 (China Standard Time)

@ahkimkoo it has not been trained in Chinese data, please use only English for now.

Thank you for your reply, but even if you use English, it can't reply normally

Dacheng Li · Answer 3 · Thu Jul 06 2023 12:17:01 GMT+0800 (China Standard Time)

Can you give a screen shot on how you are loading the model, and what inputs you give?

Musab Gultekin · Answer 4 · Fri Jul 07 2023 05:15:25 GMT+0800 (China Standard Time)

Because its not patched.
See here how to do that:
https://github.com/lm-sys/FastChat/blob/0a827abe0cc60a3733b4406a070beb1ac8d0e5e1/fastchat/model/model_adapter.py#L445

Luca Scutigliani · Answer 5 · Thu Jul 13 2023 18:58:02 GMT+0800 (China Standard Time)

@DachengLi1 I would like to follow up on this. I'm having the same issue, running the same model using fastchat openai-server implementation. Getting the same outputs (some times some "A A A A A A A A A A A A" screaming) while running the latest version with the monkey patch applied.

Here are the requests I send to the endpoint and relative output:

curl http://localhost:8100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer fattiicazzituoi" \
  -d '{
     "model": "longchat-13b-16k",
     "messages": [{"role": "user", "content": "Say this is a test."}],
     "temperature": 0.3, "max_tokens": 200
   }'

{"id":"chatcmpl-3tF6uZ7GXm54dmLwfGLQ3y","object":"chat.completion","created":1689243774,"model":"lmsys/longchat-13b-16k","choices":[{"index":0,"message":{"role":"assistant","content":"A A A A A A A A A A A A A A A A A A A A"},"finish_reason":"stop"}],"usage":{"prompt_tokens":45,"total_tokens":64,"completion_tokens":19}}

But if I use the "completions" (non-chat) endpoint the model works "correctly" (or at least it does not scream at me):

curl http://localhost:8100/v1/completions \ 
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer aaaaaaaaaaaa" \
  -d '{
    "model": "lmsys/longchat-13b-16k",
    "prompt": "Say this is a test.",
    "max_tokens": 20,
    "temperature": 0.5
  }'

{"id":"cmpl-sBiuu78WYegWDnU3WDmFmF","object":"text_completion","created":1689245357,"model":"lmsys/longchat-13b-16k","choices":[{"index":0,"text":"\nYou are a test.\n\n\n\n\n\n\n\n\n\n\n\n\n\n","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":7,"total_tokens":26,"completion_tokens":19}}

TL;DR: LongChat-13B-16K goes like this:

Dacheng Li · Answer 6 · Fri Jul 14 2023 00:08:40 GMT+0800 (China Standard Time)

@scuty2000 Fun image lol.
@merrymercy do you have an idea on this. Is there a difference in login in completions versus chat completition (e.g. load_8_bit, patching)?

Luca Scutigliani · Answer 7 · Fri Jul 14 2023 00:32:24 GMT+0800 (China Standard Time)

@DachengLi1 I don't know if this can help, but I suspect is related to the int8 quantization. Using the 7B version not quantized works pretty well.

Dacheng Li · Answer 8 · Fri Jul 14 2023 00:40:55 GMT+0800 (China Standard Time)

@scuty2000 Yes, I also heard it elsewhere.

Luca Scutigliani · Answer 9 · Wed Jul 19 2023 19:42:07 GMT+0800 (China Standard Time)

Any update on this?