LlamaEdge / LlamaEdge

The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge

Home Page:https://llamaedge.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug:

zyxcambridge opened this issue · comments

Summary

abstract_summary = abstract_summary_extraction(transcription)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zhangyixin/Desktop/MeetingSummary/openai_meetging_summary.py", line 36, in abstract_summary_extraction
response = client.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zhangyixin/anaconda3/lib/python3.11/site-packages/openai/_utils/_utils.py", line 299, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/zhangyixin/anaconda3/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 594, in create
return self._post(
^^^^^^^^^^^
File "/Users/zhangyixin/anaconda3/lib/python3.11/site-packages/openai/_base_client.py", line 1055, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zhangyixin/anaconda3/lib/python3.11/site-packages/openai/_base_client.py", line 834, in request
return self._request(
^^^^^^^^^^^^^^
File "/Users/zhangyixin/anaconda3/lib/python3.11/site-packages/openai/_base_client.py", line 899, in _request
return self._process_response(
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zhangyixin/anaconda3/lib/python3.11/site-packages/openai/_base_client.py", line 506, in _process_response
return api_response.parse()
^^^^^^^^^^^^^^^^^^^^
File "/Users/zhangyixin/anaconda3/lib/python3.11/site-packages/openai/_response.py", line 59, in parse
parsed = self._parse()
^^^^^^^^^^^^^
File "/Users/zhangyixin/anaconda3/lib/python3.11/site-packages/openai/_response.py", line 175, in parse
content_type, *
= response.headers.get("content-type").split(";")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'

Reproduction steps

curl -X POST http://0.0.0.0:8080/v1/chat/completions -H 'accept:application/json' -H 'Content-Type: application/json' -d '{"messages":[{"role":"system", "content":"You are a helpful AI assistant"}, {"role":"user", "content":"What is the capital of France?"}], "model":"Yi-34B-Chat"}'

Screenshots

DESCRIPTION

Any logs you want to share for showing the specific issue

wasmedge --dir .:. --nn-preload default:GGML:AUTO:TheBloke/Yi-34B-Chat-GGUF/yi-34b-chat.Q8_0.gguf llama-api-server.wasm -p chatml
[INFO] Socket address: 0.0.0.0:8080
[INFO] Model name: default
[INFO] Model alias: default
[INFO] Prompt context size: 512
[INFO] Number of tokens to predict: 1024
[INFO] Number of layers to run on the GPU: 100
[INFO] Batch size for prompt processing: 512
[INFO] Temperature for sampling: 0.8
[INFO] Penalize repeat sequence of tokens: 1.1
[INFO] Prompt template: ChatML
[INFO] Log prompts: false
[INFO] Log statistics: false
[INFO] Log all information: false
[INFO] Starting server ...
[INFO] Plugin version: b1953 (commit 6f9939d1)
[INFO] Listening on http://0.0.0.0:8080

[WARNING] The prompt is too long. Please reduce the length of your input and try again.

[WARNING] The prompt is too long. Please reduce the length of your input and try again.

[WARNING] The prompt is too long. Please reduce the length of your input and try again.

[WARNING] The prompt is too long. Please reduce the length of your input and try again.

Model Information

yi34

Operating system information

mac m1

ARCH

arm

CPU Information

m1

Memory Size

64

GPU Information

m1

VRAM Size

64

@zyxcambridge The command you provided is not correct. There should be a reverse prompt. Please refer to the following command:

wasmedge --dir .:. --nn-preload default:GGML:AUTO:yi-34b-chat.Q5_K_M.gguf llama-api-server.wasm -p chatml --reverse-prompt  '<|im_end|>' --ctx-size 2048