[Question] confusion on finetuning
GHSADAF opened this issue · comments
GHSADAF commented
Question
I have a custom dataset that I want to finetune LLaVA on. I was wondering is this the format my dataset should have for finetuning LLaVA? and if so, how the answer for "value" for "gpt" should be provided if these values are LLaVA generated answers? or these are ground-truth answers?
[
{
"id": "997bb945-628d-4724-b370-b84de974a19f",
"image": "part-000001/997bb945-628d-4724-b370-b84de974a19f.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\nWrite a prompt for Stable Diffusion to generate this image."
},
{
"from": "gpt",
"value": "a beautiful painting of chernobyl by nekro, pascal blanche, john harris, greg rutkowski, sin jong hun, moebius, simon stalenhag. in style of cg art. ray tracing. cel shading. hyper detailed. realistic. ue 5. maya. octane render. "
},
]
},
...
]
Pedram Aghazadeh commented
The value by gpt is your ground-truth answer.