zengyan-97 / X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fine-tuning

TheodorPatrickZ opened this issue · comments

Hello,

I wonder how did you finetune your model for the Flickr30K dataset?
Did you freeze The Text and Vision Encoder on only fine-tuned the itm_head, or did you apply the fine-tuning to the whole model?

Hi,

I finetuned the whole model. Please read the code for details.