I want to use the existing image-text pedestrian dataset and finetune the BLIP model. Should I use pre-trained checkpoints weights or finetuned checkpoints weights?

Question

I want to use the existing image-text pedestrian dataset and finetune the BLIP model. Should I use pre-trained checkpoints weights or finetuned checkpoints weights?

shams2023 opened this issue 2 months ago · comments

shams2023 commented 2 months ago

I want to use the existing image-text pedestrian dataset and finetune the BLIP model. Should I use pre-trained checkpoints weights or finetuned checkpoints weights?
The generated text should be as detailed as possible, with a length of 40-60!
Looking forward to your answer!
Thank you!