Weird Ouptut for simple image

Question

Weird Ouptut for simple image

mridulbirla opened this issue a year ago · comments

I tried testing this with the below sample image and the modified test_cheetah_llama2.py to

######## Example 0 ######
print("\nExample 0")
context = "<Img>HereForImage</Img> what does picture is about? "
raw_img_list = ['./examples/screenshot.jpg']
print("Question: ", context)
llm_message = chat.answer(raw_img_list, context)
print("Answer: ", llm_message)

The output looks super wierd. . Is there something I am doing wrong or you also encounter same kind of issue.
I using Llam2

Example 0
Question:  <Img>HereForImage</Img> what does picture is about?
Answer:   When the page loads, you will see an image of a person sitting at a desk with a laptop open in front of them. The person is wearing a blue shirt and has a blue headset on their head. There is a blue book on the desk in front of them.

Zhiqi Ge · Answer 1 · Sat Aug 26 2023 15:13:43 GMT+0800 (China Standard Time)

Thank you for bringing this to our attention. We appreciate your effort in testing and sharing the feedback.

Upon reviewing the case you've presented, we've observed similar issues not only with Cheetah but also with other multimodal LLMs. It appears to be a more generalized problem for multimodal LLMs.

Rest assured, we recognize the importance of addressing this issue and will work towards finding a solution in upcoming updates.