haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Home Page:https://llava.hliu.cc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question] How to evaluate pretraining[image-text alignment] performance?

enkaranfiles opened this issue · comments

Question

I have trained the vision tower module by replacing another vision encoder and gathering new custom data from another domain. But I wonder how I can evaluate pretraining performance since it is the crucial part for image-text alignment, it must be consider? Anyone who can response, thanks!