OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can we use in-context multimodal data for finetuning?

waltonfuture opened this issue · comments

Thanks for your great work! However, it seems that we can only use data that contains one image for SFT. Can we use in-context multimodal data (i.e., containing multiple images) for finetuning?

yes, the code supports multi-image finetuning

yes, the code supports multi-image finetuning

Thank you. How should I organize my data for multi-image sft? And how to inference with multi-image?

Same problem here. Any update on multi-image sft?

@qyc-98 Hello! Can you provide some simple examples of in-context inference or SFT? Thanks a lot!

@qyc-98 I have encountered the same problem. Have you resolved it