magic-research / PLLaVA

Official repository for the paper PLLaVA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

finetuning PLLaVA - IMAGEs for Video Analysis

sureshtmca opened this issue · comments

My Objective is 'video analysis' of a specific subject. I have images pertaining to that subject not videos. Question )
Can I use images to finetune the 'PLLaVa_7b as the pretrained model' ? If so, can the training dataset format be used as below :
{"row_idx":0,"row":{"image":{"src":".. / IMG1.jpg","height":4032,"width":3024},"caption":"xyxxxyxxx"},"truncated_cells":[]}

Interesting perspective.

Most simple way here is to preprocess the images and convert them to gif files, then run the training untouched.

If I get you right, this should reveal the model's understanding of static video of objects?

Thank you ...
On your suggestion, if i have multiple sequence of images that reflects the activity of a subject, post IMG to GIF conversion, sequence of images together can create a movement in the GIF file. But my image is a stand alone static visual, post conversion also it is same stand alone and static without any movement. So does this aspect work for PLLaVA. Believe me if I can do this, this is going to create a domain specific beautiful LVLM.

as i can't wait for the response, i found a way to deal this issue. Let me try that way of training