salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I want to use the existing image-text pedestrian dataset and finetune the BLIP model. Should I use pre-trained checkpoints weights or finetuned checkpoints weights?

shams2023 opened this issue · comments

I want to use the existing image-text pedestrian dataset and finetune the BLIP model. Should I use pre-trained checkpoints weights or finetuned checkpoints weights?
The generated text should be as detailed as possible, with a length of 40-60!
Looking forward to your answer!
Thank you!