QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

【疑问】使用web的图片理解问答和API调用结果差距显著

gjhhust opened this issue · comments

经过多次测试,我希望得到图片的tag(从tag list中选择),在web端我提问结果就比较正常,类似下面的回答:
image

但是使用API python提问的结果就非常奇怪:

  1. 直接将我的标签列表全部顺序不变返回给我(频率最高,测试10次有一半都是这样)
  2. 推理错误Inference error
  3. 忽略我的问题,直接对图片进行了描述
    始终得不到类似web端的提问或者类似的答案,是为什么?
from dashscope import MultiModalConversation
from http import HTTPStatus

def call_with_local_file():
    """Sample of use local file.
       linux&mac file schema: file:///home/images/test.png
       windows file schema: file://D:/images/abc.png
    """
    local_file_path1 = "file:///QwenWQwl2/test_resize_image.png"
    local_file_path2 = 'file://The_local_absolute_file_path2'
    messages = [{
        'role': 'system',
        'content': [{
            'text': 'You are a helpful assistant.'
        }]
    }, {
        'role':
        'user',
        'content': [
            {
                'image': local_file_path1
            },
            {
                'text': "tags list: Motor vehicle lane, Non-motor vehicle lane, Mixed traffic road, Zebra crossing, Main road, Rural road, Interior alley, Pedestrian crossing, Crossroad, T-junction, Intersection, Railway line, Railway crossing, Overpass, Interchange, Pedestrian overpass, Bridge, Pedestrian overpass entrance/exit, Underground tunnel, Underground pedestrian passage, Tunnel passage area, Mountain tunnel, Tunnel entrance/exit, Pool, River, Lake surface, Outdoor parking lot, Road-marked parking space, City square, Exposed farmland, Forest area, Lawn, Trees, Bus station, Toll booth, Inspection station, Gas station, Sentry box, Vehicle barrier, Pedestrian gate, Security checkpoint machine, Iron gate, Security booth, Door or automatic door, Personnel entrance/exit, Vehicle entrance/exit, Billboard, Banner, Street-front shop, Open-air barbecue stall, Supermarket, Building construction, Road construction, Personnel checkpoint, Vehicle checkpoint, Pedestrian gate, Main entrance/exit of the venue, Security gate, X-ray security machine, Inside elevator, Escalator, Stairs, Steps, Indoor passage, Corridor, Front desk area, Public hall, Indoor parking lot. 从tags list中选择10个标签概括和描述图片,回答格式:[tag1. tag2. tag3.]"
            },
        ]
    }]
    response = MultiModalConversation.call(model=MultiModalConversation.Models.qwen_vl_chat_v1, messages=messages)
    if response.status_code == HTTPStatus.OK:
        print(response)
    else:
        print(response.code)  # The error code.
        print(response.message)  # The error message.


if __name__ == '__main__':
    call_with_local_file()

您好,请问这个问题有解决办法了吗