pipilurj/bootstrapped-preference-optimization-BPO-

This repository contains the code for the paper titled "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization". [Link to our paper]

Install Packages


conda create -n bpo python=3.10 -y

conda activate bpo

pip install -e .

Training data

Download ShareGPT4V from here

Download COCO from here

Download dataset annotation from here

Extract data from ShareGPT4V and organize the images as follows:

Image_root
├── coco/
        train2017/
├── llava/
          llava_pretrain /
├── sam/
├── share_textvqa/
                images/
├── web-celebrity/
                  images/
├── web-landmark/
                 images/
├── wikiart/
            images/

Train BPO

bash scripts/finetune_bpo.sh

Acknowledgement

The project is built on top of the amazing multimodal large language model LLaVA, RLHF package trl, and DPO for multimodal learning Silkie. Thanks for these great work!

If you find our work useful for your research or applications, please cite using this BibTeX:

@misc{pi2024strengthening,
      title={Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization},
      author={Renjie Pi and Tianyang Han and Wei Xiong and Jipeng Zhang and Runtao Liu and Rui Pan and Tong Zhang},
      year={2024},
      eprint={2403.08730},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"

Apache License 2.0

Languages

Language:Python 88.5%Language:Shell 8.2%Language:JavaScript 1.6%Language:HTML 1.2%Language:CSS 0.3%Language:C 0.2%