* Equally contributing first authors
- Apr-26-24- Phi-3-V and LLaVA-3-V released: Excited to release the new integration of LLaVA with Phi-3 Mini Instruct and LLaMA-3 Instruct models! π₯π₯π₯
This repository enhances the capabilities of the LLaVA 1.5 model, incorporating latest LLMs released this weakπ₯, Phi-3 Mini Instruct 3.8B, and LLaMA-3 Instruct 8B.
Model | MMMU | POPE | MME | MMBench-en | MMBench-cn | SEED-all | SEED-img | SEED-vid | LLaVA-Wild | GQA | Science-QA | Average |
---|---|---|---|---|---|---|---|---|---|---|---|---|
LLaVA-v1.5-7B | 35.4 | 85.8 | 1510.7 | 64.3 | 58.3 | 58.6 | 66.1 | 37.3 | 65.4 | 62.0 | 66.8 | 58.9 |
LLaVA-v1.5-13B | 36.4 | 85.9 | 1531.3 | 67.7 | 63.6 | 61.6 | 68.2 | 42.7 | 72.5 | 63.3 | 71.6 | 62.3 |
Phi-3-V-mini-3.8B | 37.8 | 85.6 | 1470.1 | 68.2 | 68.1 | 62.8 | 67.7 | 44.5 | 70.9 | 61.7 | 80.7 | 63.2 |
π LLaMA-3-V-8B results and models - coming soon!
*Average computed excluding MME
The following table provides an overview of the available models in our zoo. For each model, you can find links to its Hugging Face page.
Model Name | Hugging Face Link | Summary |
---|---|---|
LLaVA-Phi-3-mini-4k-instruct-pretrain | HF | Pretrained on LCS-558K. |
LLaVA-Phi-3-mini-4k-instruct-lora | HF | LoRA weights fine-tuned on LLaVA-Instruct-665K |
LLaVA-Phi-3-mini-4k-instruct | HF | Merged weights in HuggingFace format. |
git clone https://github.com/mbzuai-oryx/LLaVA-pp.git
cd LLaVA-pp
git submodule update --init --recursive
Packages you need to update from LLAVA:
pip install git+https://github.com/huggingface/transformers@a98c41798cf6ed99e1ff17e3792d6e06a2ff2ff3
To integrate Phi-3-V with LLaVA, follow these steps to update the codebase:
# Copy necessary files
cp Phi-3-V/train.py LLaVA/llava/train/train.py
cp Phi-3-V/llava_phi3.py LLaVA/llava/model/language_model/llava_phi3.py
cp Phi-3-V/builder.py LLaVA/llava/model/builder.py
cp Phi-3-V/model__init__.py LLaVA/llava/model/__init__.py
cp Phi-3-V/main__init__.py LLaVA/llava/__init__.py
cp Phi-3-V/conversation.py LLaVA/llava/conversation.py
# Training commands
cp scripts/Phi3-V_pretrain.sh LLaVA/Vi-phi3_pretrain.sh
cp scripts/Phi3-V_finetune_lora.sh LLaVA/Vi-phi3_finetune_lora.sh
- Pre-train
cd LLaVA
bash Phi3-V_pretrain.sh
- Finetune
cd LLaVA
bash Phi3-V_finetune_lora.sh
To integrate LLaMA-3-V with LLaVA, follow these steps to update the codebase:
# Copy necessary files
cp Phi-3-V/train.py LLaVA/llava/train/train.py
cp Phi-3-V/conversation.py LLaVA/llava/conversation.py
# Training commands
cp scripts/LLaMA3-V_pretrain.sh LLaVA/LLaMA3-V_pretrain.sh
cp scripts/LLaMA3-V_finetune_lora.sh LLaVA/LLaMA3-V_finetune_lora.sh
- Pre-train
cd LLaVA
bash LLaMA3-V_pretrain.sh
- Finetune
cd LLaVA
bash LLaMA3-V_finetune_lora.sh
We are thankful to LLaVA, and lmms-eval for releasing their models and code as open-source contributions.