findalexli/mllm-dpo

This repo contains the code and the data for the following paper:

@misc{li2024multimodal,
    title={Multi-modal preference alignment remedies regression of visual instruction tuning on language model},
    author={Shengzhi Li and Rongyu Lin and Shichao Pei},
    year={2024},
    eprint={2402.10884},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

[Arxiv paper] [GitHub] [Data] [Model] [Data]

Developers: Shengzhi Li (TIFIN.AI), Rongyu Lin (KAUST), Shichao Pei (University of Massachusetts Boston) Affiliations: TIFIN, KAUST, University of Massachusetts Boston Contact Information: alex.li@tifin.com, rongyu.lin@kaust.edu.sa, shichao.pei@umb.edu

Introduction

This guide provides step-by-step instructions for fine-tuning using the alignment methods and evaluating the LLaVA model, specifically focusing on visual instruction tuning using SciGraphQA and LRV-instruct datasets.

Installation

Unzip the repository:

Set up the environment:

conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip
pip install -e .

Install packages for training:

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Data Preparation

Download datasets and images:
- SciGraphQA: Download Link
- LRV-Insturct: Download Link
The images for LRC-Instruct shall be downloaded by: gdown https://drive.google.com/uc?id=1k9MNV-ImEV9BYEOeLEIb4uGEUZjd3QbM

The images for SciGraphQA can be downloaded by: https://huggingface.co/datasets/alexshengzhili/SciGraphQA-295K-train/resolve/main/img.zip?download=true 2. Organize the images in ./playground/data:

```
playground/
└── data/
    ├── scigraphqa/
    │   └── images/
    └── lrv_instruct/
        └── images/
```

For DPO, please see playground/data/dpo_inference0104.with_logpllava-v1.5-13b_2024-02-03.json
For non-DPO data, we also provide each of the alignment method (SteerLM, Rejection Sampling and Standard SFT) in the data folder such as playground/data/rejection_sampling.json playground/data/standard_sft.json playground/data/steerlm.json

Training

Use scripts/v1/finetune_dpo.sh for DPO experiments
Use scripts/v1/finetune_steer.sh for non-DPO experiments,

Evaluation

Use the provided evaluation scripts under scripts/v1_5/eval/ to assess the performance of your fine-tuned model on various benchmarks. Ensure that you follow the guidelines for using greedy decoding to ensure consistency with real-time outputs.

We thank the authors of LLaVA, Vicuna for which the origional state of this repo is based on

About

Repo associated with the paper Multi-modal preference alignment remedies regression of visual instruction tuning on language model

Languages

Language:Jupyter Notebook 91.9%Language:Python 6.9%Language:Shell 0.9%Language:JavaScript 0.1%Language:HTML 0.1%Language:CSS 0.0%

findalexli / mllm-dpo