dify 0.6.8 fails to deploy local xinference vl model（qwen-vl-chat）

Question

dify 0.6.8 fails to deploy local xinference vl model（qwen-vl-chat）

ricky977 opened this issue 21 days ago · comments

ricky977 commented 21 days ago

Self Checks

This is only for bug report, if you would like to ask a question, please head to Discussions.
I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

Dify version

0.6.8

Cloud or Self Hosted

Self Hosted (Docker), Self Hosted (Source)

Steps to reproduce

I used docker and source code to build dify0.6.8, and deployed the local qwen-vl-chat model under xinference according to the instructions, using the computer IP address and the default port of xinference, 9997. However, when calling the qwen-vl model, it failed and kept reporting the error "Run failed:Node LLM run failed:[xinference]An error occurred during streaming". The same error will occur when only text is entered without uploading images to the platform, and the model has been successfully connected to the platform. Operating system: Ubuntu 22.04/CentOS7

The background does not display any other error information, and the model interface call is successful.

At the same time, when calling the visual model qwen-vl-chat1 on the xinference web page, everything is normal, but an error is reported when dify is accessed.

The xinference model information is as follows：

✔️ Expected Behavior

Hope to successfully call the qwen-vl-chat model

❌ Actual Behavior

Run failed:Node LLM run failed:[xinference]An error occurred during streaming

dosubot · Answer 1 · Mon Jul 01 2024 14:37:57 GMT+0800 (China Standard Time)

Hey @ricky977! 👋 I'm here to help you with any bugs, questions, or contributions. Let's squash those bugs together!

To resolve the error "Run failed: Node LLM run failed: [xinference] An error occurred during streaming" when deploying the local qwen-vl-chat model under Xinference using Dify version 0.6.8, you need to disable the streaming mode. The function calling of Xinference does not support stream mode currently.

Here is how you can modify your code to disable streaming:

response = model.invoke(
    model='ChatGLM3',
    credentials={
        'server_url': os.environ.get('XINFERENCE_SERVER_URL'),
        'model_uid': os.environ.get('XINFERENCE_CHAT_MODEL_UID')
    },
    prompt_messages=[
        SystemPromptMessage(
            content='You are a helpful AI assistant.',
        ),
        UserPromptMessage(
            content='Hello World!'
        )
    ],
    model_parameters={
        'temperature': 0.7,
        'top_p': 1.0,
    },
    stop=['you'],
    stream=False,  # Disable streaming
    user="abc-123"
)

By setting stream=False, you should be able to avoid the streaming error.

Additionally, ensure that your Xinference deployment is correctly set up. Follow these steps to deploy Xinference locally:

Install Xinference via PyPI:
```
$ pip install "xinference[all]"
```

Start Xinference locally:

$ xinference-local
2023-08-20 19:21:05,265 xinference   10148 INFO     Xinference successfully started. Endpoint: http://127.0.0.1:9997
2023-08-20 19:21:05,266 xinference.core.supervisor 10148 INFO     Worker 127.0.0.1:37822 has been added successfully
2023-08-20 19:21:05,267 xinference.deploy.worker 10148 INFO     Xinference worker successfully started.

Xinference is started locally with the endpoint http://127.0.0.1:9997 and the default port 9997. By default, it is accessible only from localhost, but you can set -H 0.0.0.0 to allow access from external clients. For more detailed configuration options, use xinference-local --help.

Access http://127.0.0.1:9997 to select and deploy the required model. Ensure the model is compatible with your hardware platform.
Obtain the model UID from the deployment page.
In Dify, go to Settings > Model Providers > Xinference and fill in:
- Model Name: vicuna-v1.3
- Server URL: http://<Machine_IP>:9997 (replace with your machine's IP address)
- Model UID: (the UID obtained from the deployment page)

By following these steps, you should be able to deploy and use the local qwen-vl-chat model without encountering the streaming error. Note that the Xinference implementation currently does not support streaming mode for tool calls, as indicated in the code:

if tools and len(tools) > 0:
    raise InvokeBadRequestError('xinference tool calls does not support stream mode')

Therefore, ensure that you disable streaming when invoking models that involve tool calls ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.