VL-Instruct

Reference codes:

Hard-coded the vicuna7b checkpoint in the LAVIS pkg. Install from the local source

cd LAVIS
pip install -e .

Modify the data paths in the personalized dataloader in data/vqa_dataset.py

python -m torch.distributed.run --nproc_per_node=8 train_vqa.py --model_type blip2_t5 --train_qformer

TODO: double check if the model is loaded properly. Pay attention to qformer_text_input according to this issue.

python -m torch.distributed.run --nproc_per_node=8 train_vqa.py --model_type blip2_vicuna --train_qformer

python -m torch.distributed.run --nproc_per_node=8 train_vqa.py --model_type blip2_vicuna --train_qformer --train_llm

export TRANSFORMERS_CACHE=/projects/nlp_lab/zhiyang/.cache/

pip install flash-attn==2.1.1 --no-build-isolation

About

Codes for vision-language instruction tuning. Currently support BLIP2-t5 and BLIP2-vicuna.

Language:Python 99.6%Language:Shell 0.4%