[Question] confusion on finetuning

Question

[Question] confusion on finetuning

GHSADAF opened this issue 20 days ago · comments

Question

I have a custom dataset that I want to finetune LLaVA on. I was wondering is this the format my dataset should have for finetuning LLaVA? and if so, how the answer for "value" for "gpt" should be provided if these values are LLaVA generated answers? or these are ground-truth answers?

[
  {
    "id": "997bb945-628d-4724-b370-b84de974a19f",
    "image": "part-000001/997bb945-628d-4724-b370-b84de974a19f.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWrite a prompt for Stable Diffusion to generate this image."
      },
      {
        "from": "gpt",
        "value": "a beautiful painting of chernobyl by nekro, pascal blanche, john harris, greg rutkowski, sin jong hun, moebius, simon stalenhag. in style of cg art. ray tracing. cel shading. hyper detailed. realistic. ue 5. maya. octane render. "
      },
    ]
  },
  ...
]

Pedram Aghazadeh · Answer 1 · Wed May 15 2024 08:12:16 GMT+0800 (China Standard Time)

The value by gpt is your ground-truth answer.