finetuning PLLaVA - IMAGEs for Video Analysis

Question

finetuning PLLaVA - IMAGEs for Video Analysis

sureshtmca opened this issue a month ago · comments

My Objective is 'video analysis' of a specific subject. I have images pertaining to that subject not videos. Question )
Can I use images to finetune the 'PLLaVa_7b as the pretrained model' ? If so, can the training dataset format be used as below :
{"row_idx":0,"row":{"image":{"src":".. / IMG1.jpg","height":4032,"width":3024},"caption":"xyxxxyxxx"},"truncated_cells":[]}

ermu2001 · Answer 1 · Fri May 10 2024 21:31:34 GMT+0800 (China Standard Time)

Interesting perspective.

Most simple way here is to preprocess the images and convert them to gif files, then run the training untouched.

If I get you right, this should reveal the model's understanding of static video of objects?

sureshtmca · Answer 2 · Wed May 15 2024 17:41:35 GMT+0800 (China Standard Time)

Thank you ...
On your suggestion, if i have multiple sequence of images that reflects the activity of a subject, post IMG to GIF conversion, sequence of images together can create a movement in the GIF file. But my image is a stand alone static visual, post conversion also it is same stand alone and static without any movement. So does this aspect work for PLLaVA. Believe me if I can do this, this is going to create a domain specific beautiful LVLM.

sureshtmca · Answer 3 · Fri May 17 2024 13:16:21 GMT+0800 (China Standard Time)

as i can't wait for the response, i found a way to deal this issue. Let me try that way of training