![BPO](https://raw.githubusercontent.com/pipilurj/bootstrapped-preference-optimization-BPO-/main/images/logo.png)
Generated by DALL·E 3
This repository contains the code for the paper titled "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization". [Link to our paper]
conda create -n bpo python=3.10 -y
conda activate bpo
pip install -e .
Download ShareGPT4V from here
Download COCO from here
Download dataset annotation from here
Extract data from ShareGPT4V and organize the images as follows:
Image_root
├── coco/
train2017/
├── llava/
llava_pretrain /
├── sam/
├── share_textvqa/
images/
├── web-celebrity/
images/
├── web-landmark/
images/
├── wikiart/
images/
bash scripts/finetune_bpo.sh
The project is built on top of the amazing multimodal large language model LLaVA, RLHF package trl, and DPO for multimodal learning Silkie. Thanks for these great work!
If you find our work useful for your research or applications, please cite using this BibTeX:
@misc{pi2024strengthening,
title={Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization},
author={Renjie Pi and Tianyang Han and Wei Xiong and Jipeng Zhang and Runtao Liu and Rui Pan and Tong Zhang},
year={2024},
eprint={2403.08730},
archivePrefix={arXiv},
primaryClass={cs.CL}
}