haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Home Page:https://llava.hliu.cc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question] I have a question about capture to a custom data image.

yspch2022 opened this issue · comments

Question

Hello, I'm making instruction-following data while looking at papers and reference documents using my custom images.
But I have a question.
When constructing instruction-following data, 'Context type 1: Captions' is used to make captions to describe the image,
how do I make captions for the image? I think it's made by putting it in a model such as BLIP, llava, or GPT4, right?