DreamLIP: Language-Image Pre-training with Long Captions

DreamLIP: Language-Image Pre-training with Long Captions
Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen
Project Page | Paper | Data

📰 News

[2024/07/26] Long captions (LLAVA1.5, InstructBLIP and shareGPT4V) of CC3M and CC12M are released~
[2024/07/16] Upload the pretrained weight of VIT-B/16 pretrained in CC3M, CC12M, YFCC15M, and merged-30M (long captions of ShareGPT4V)!
[2024/07/08] DreamLIP is accepted by ECCV 2024!

💡 Highlights

🔥 Exploring how language-image pre-training could benefit from long captions.
🔥 Strong improvement on semantic segmentation, image-text retrieval, semantic segmentation, and image understanding in MLLM.

🔥 DreamLIP trained with 30M image-text pairs achieves on par or even better performance than CLIP trained with 400M pairs.

🎨 In-Progress

Release long captions of YFCC15M.
Release training code

🏝️ Overview of supported long captions:

Long Captions of Supported Datasets (5)

Long Captions of MLLMs (3)

Generated Long Captions

Raw/Long/Short Caption	InstructBLIP + LLAVA1.5 + ShareGPT4V
CC3M	Link
CC12M	Link
YFCC15M	TODO

Pretrained checkpoints

Dataset	Model	ShareGPT4V	InstructBLIP + LLAVA1.5 + ShareGPT4V
CC3M	ViT-B/16	Link	TODO
CC12M	ViT-B/16	Link	TODO
YFCC15M	ViT-B/16	Link	TODO
CC30M	ViT-B/16	Link	TODO

📣 Instructions

Environment installation

pip install -r requirments.txt

Evaluate zero shot classification

bash eval_zs.sh

📖 Citation

@inproceedings{DreamLIP,
  title={DreamLIP: Language-Image Pre-training with Long Captions},
  author={Zheng, Kecheng and Zhang, Yifei and Wu, Wei and Lu, Fan and Ma, Shuailei and Jin, Xin and Chen, Wei and Shen, Yujun},
  booktitle={ECCV},
  year={2024}
}

Acknowledgements

This project is based on open_clip, and thanks for the nice work! We also thank InstructBLIP, ShareGPT4V and LLAVA for the pretrained models and codes.

About

[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions

https://zyf0619sjtu.github.io/dream-lip/

Other

Languages

Language:Python 99.0%Language:Shell 0.9%Language:Makefile 0.1%