zyf0619sjtu / DreamLIP

[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions

Home Page:https://zyf0619sjtu.github.io/dream-lip/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DreamLIP: Language-Image Pre-training with Long Captions

DreamLIP: Language-Image Pre-training with Long Captions
Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen
Project Page | Paper | Data

πŸ“° News

  • [2024/07/26] Long captions (LLAVA1.5, InstructBLIP and shareGPT4V) of CC3M and CC12M are released~
  • [2024/07/16] Upload the pretrained weight of VIT-B/16 pretrained in CC3M, CC12M, YFCC15M, and merged-30M (long captions of ShareGPT4V)!
  • [2024/07/08] DreamLIP is accepted by ECCV 2024!

πŸ’‘ Highlights

  • πŸ”₯ Exploring how language-image pre-training could benefit from long captions.
  • πŸ”₯ Strong improvement on semantic segmentation, image-text retrieval, semantic segmentation, and image understanding in MLLM.

  • πŸ”₯ DreamLIP trained with 30M image-text pairs achieves on par or even better performance than CLIP trained with 400M pairs. timeline.jpg

🎨 In-Progress

  • Release long captions of YFCC15M.
  • Release training code

🏝️ Overview of supported long captions:

Long Captions of Supported Datasets (5)
Long Captions of MLLMs (3)

Generated Long Captions

Raw/Long/Short Caption InstructBLIP + LLAVA1.5 + ShareGPT4V
CC3M Link
CC12M Link
YFCC15M TODO

Pretrained checkpoints

Dataset Model ShareGPT4V InstructBLIP + LLAVA1.5 + ShareGPT4V
CC3M ViT-B/16 Link TODO
CC12M ViT-B/16 Link TODO
YFCC15M ViT-B/16 Link TODO
CC30M ViT-B/16 Link TODO

πŸ“£ Instructions

Environment installation

pip install -r requirments.txt

Evaluate zero shot classification

bash eval_zs.sh

πŸ“– Citation

@inproceedings{DreamLIP,
  title={DreamLIP: Language-Image Pre-training with Long Captions},
  author={Zheng, Kecheng and Zhang, Yifei and Wu, Wei and Lu, Fan and Ma, Shuailei and Jin, Xin and Chen, Wei and Shen, Yujun},
  booktitle={ECCV},
  year={2024}
}

Acknowledgements

This project is based on open_clip, and thanks for the nice work! We also thank InstructBLIP, ShareGPT4V and LLAVA for the pretrained models and codes.

About

[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions

https://zyf0619sjtu.github.io/dream-lip/

License:Other


Languages

Language:Python 99.0%Language:Shell 0.9%Language:Makefile 0.1%