dify 0.6.8 fails to deploy local xinference vl model(qwen-vl-chat)
ricky977 opened this issue · comments
Self Checks
- This is only for bug report, if you would like to ask a question, please head to Discussions.
- I have searched for existing issues search for existing issues, including closed ones.
- I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- Please do not modify this template :) and fill in all the required fields.
Dify version
0.6.8
Cloud or Self Hosted
Self Hosted (Docker), Self Hosted (Source)
Steps to reproduce
I used docker and source code to build dify0.6.8, and deployed the local qwen-vl-chat model under xinference according to the instructions, using the computer IP address and the default port of xinference, 9997. However, when calling the qwen-vl model, it failed and kept reporting the error "Run failed:Node LLM run failed:[xinference]An error occurred during streaming". The same error will occur when only text is entered without uploading images to the platform, and the model has been successfully connected to the platform. Operating system: Ubuntu 22.04/CentOS7
The background does not display any other error information, and the model interface call is successful.
At the same time, when calling the visual model qwen-vl-chat1 on the xinference web page, everything is normal, but an error is reported when dify is accessed.
The xinference model information is as follows:
✔️ Expected Behavior
Hope to successfully call the qwen-vl-chat model
❌ Actual Behavior
Run failed:Node LLM run failed:[xinference]An error occurred during streaming
Hey @ricky977! 👋 I'm here to help you with any bugs, questions, or contributions. Let's squash those bugs together!
To resolve the error "Run failed: Node LLM run failed: [xinference] An error occurred during streaming" when deploying the local qwen-vl-chat model under Xinference using Dify version 0.6.8, you need to disable the streaming mode. The function calling of Xinference does not support stream mode currently.
Here is how you can modify your code to disable streaming:
response = model.invoke(
model='ChatGLM3',
credentials={
'server_url': os.environ.get('XINFERENCE_SERVER_URL'),
'model_uid': os.environ.get('XINFERENCE_CHAT_MODEL_UID')
},
prompt_messages=[
SystemPromptMessage(
content='You are a helpful AI assistant.',
),
UserPromptMessage(
content='Hello World!'
)
],
model_parameters={
'temperature': 0.7,
'top_p': 1.0,
},
stop=['you'],
stream=False, # Disable streaming
user="abc-123"
)
By setting stream=False
, you should be able to avoid the streaming error.
Additionally, ensure that your Xinference deployment is correctly set up. Follow these steps to deploy Xinference locally:
-
Install Xinference via PyPI:
$ pip install "xinference[all]"
-
Start Xinference locally:
$ xinference-local 2023-08-20 19:21:05,265 xinference 10148 INFO Xinference successfully started. Endpoint: http://127.0.0.1:9997 2023-08-20 19:21:05,266 xinference.core.supervisor 10148 INFO Worker 127.0.0.1:37822 has been added successfully 2023-08-20 19:21:05,267 xinference.deploy.worker 10148 INFO Xinference worker successfully started.
Xinference is started locally with the endpoint
http://127.0.0.1:9997
and the default port9997
. By default, it is accessible only from localhost, but you can set-H 0.0.0.0
to allow access from external clients. For more detailed configuration options, usexinference-local --help
. -
Access
http://127.0.0.1:9997
to select and deploy the required model. Ensure the model is compatible with your hardware platform. -
Obtain the model UID from the deployment page.
-
In Dify, go to
Settings > Model Providers > Xinference
and fill in:- Model Name:
vicuna-v1.3
- Server URL:
http://<Machine_IP>:9997
(replace with your machine's IP address) - Model UID: (the UID obtained from the deployment page)
- Model Name:
By following these steps, you should be able to deploy and use the local qwen-vl-chat model without encountering the streaming error. Note that the Xinference implementation currently does not support streaming mode for tool calls, as indicated in the code:
if tools and len(tools) > 0:
raise InvokeBadRequestError('xinference tool calls does not support stream mode')
Therefore, ensure that you disable streaming when invoking models that involve tool calls [1][2][3][4].