nopperl / clip-synthetic-captions

Tiny-scale experiment showing that CLIP models trained using detailed captions generated by multimodal models (CogVLM and LLaVA 1.5) outperform models trained using the original alt-texts on a range of classification and retrieval tasks.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nopperl/clip-synthetic-captions Stargazers