OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multiple frames from video

Tsardoz opened this issue · comments

Does it work with multiple frames?
I tried reading sequential frames froma folder, converting to base64 and appending but I get an error when using chat_model.chat(inputs). Is this supported?
test_video.txt

I have the same issue. I tried to feed the model multiple images, and the answer I got was "image encoder error". I look at the code of chat.py and found that the chat method in the MiniCPMV class only accepts a single image. I am also curious whether the model has the ability to read multiple images at the same time for conversation like GPT4.

hi, this is a very good try. it is capable of inputting multiple images. But of course, it wasn't trained on video scenarios, which leads to the fact that he may not be very good. You can have a try.
please refer to this link
https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/discussions/2