yestaehyung / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LLaMA-Adapter 모델에 대한 코드 리뷰를 하기 위해 fork한 repository

Official implementation of 'LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention' and 'LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model'.

This repo proposes LLaMA-Adapter (V2), a lightweight adaption method for fine-tuning Instruction-following and Multi-modal LLaMA models 🔥.

Released Models

Name Approach Data Modality Visual Text
LLaMA-Adapter V1 prefix, gate Alpaca Text × LLaMA-7B
LLaMA-Adapter V2 dialog scale, bias, norm ShareGPT Text × LLaMA-65B
LLaMA-Adapter V2 multimodal [P] prefix, projection, gate
[F] bias, norm
[P] Image-Text-V1
[F] GPT4LLM, LLaVA
Image&Text CLIP-ViT-L/14 LLaMA-7B
LLaMA-Adapter V2.1 multimodal [P] prefix, projection, gate
[F] bias, norm, lora
[P] Image-Text-V1
[F] GPT4LLM, LLaVA, VQAv2
Image&Text CLIP-ViT-L/14 LLaMA-7B
ImageBind-LLM [P] prefix, projection, gate
[F] bias, norm, lora
[P] Image-Text-V1
[F] Instruction Following
ImageBind Modalities + Point Cloud imagebind_huge Open-Chinese-LLaMA-7B
ImageBind-dialog [P] prefix, projection, gate
[F] bias, norm, lora
[P] Image-Text-V1
[F] LLaVA, ShareGPT
ImageBind Modalities + Point Cloud imagebind_huge Open-Chinese-LLaMA-7B
  • [P] means Pre-train and [F] means Fine-tune
  • Image-Text-V1 is a concatenation of LAION400M, COYO, MMC4, SBU, Conceptual Captions, and COCO
  • ImageBind Modalities include image, video, text, audio, depth, thermal, IMU
  • ImageBind-dialog will be release soon

Overview

Efficiency Comparison:

Model Parameters Storage Space Training Time
Alpaca 7B 13G 3 Hours
LLaMA-Adapter 1.2M 4.7M 1 Hour

By inserting adapters into LLaMA's transformer, our method only introduces 1.2M learnable parameters, and turns a LLaMA into an instruction-following model within 1 hour. For stablizing training at early stages, we propose a novel Zero-init Attention with zero gating mechanism to adaptively incorporate the instructional signals. After fine-tuning, LLaMA-Adapter can generate high-quality instruction-following sentences, comparable to the fully fine-tuned Stanford Alpaca and Alpaca-Lora.

Our approach can be simply extended to Multi-modal Input Instructions. The reasoning framework of image-conditioned LLaMA-Adapter for ScienceQA is as follows, which is also shared by other modalities, such as audio and video.

Setup

Here is a from-scratch script for LLaMA-Adapter V1.

conda create -n llama_adapter -y python=3.8
conda activate llama_adapter

# install pytorch
conda install pytorch cudatoolkit -c pytorch -y

# install dependency and llama-adapter
pip install -r requirements.txt
pip install -e .

Note: To setup other models, please refer to llama_adapter_v2_chat65b, llama_adapter_v2_multimodal7b and imagebind_LLM for more details.

Inference

Please request access to the pre-trained LLaMA from this form (official) or download the LLaMA-7B from Hugging Face (unofficial). Then, obtain the weights of our LLaMA-Adapter from here. We denote the path to the downloaded weights of LLaMA and adapters as TARGET_FOLDER and ADAPTER_PATH.

Here is an example to generate instruction-following sentences with 7B LLaMA model and our LLaMA-Adapter:

torchrun --nproc_per_node 1 example.py \
         --ckpt_dir $TARGET_FOLDER/model_size\
         --tokenizer_path $TARGET_FOLDER/tokenizer.model \
         --adapter_path $ADAPTER_PATH

Training

We release the simple fine-tuning code of LLaMA-Adapter on LLaMA-7B model at here, which is for effortless reproduction with minimal dependencies. We will soon release the fine-tuning code for LLaMA-65B and multi-model LLaMA-Adapter.

Please download the 52K instruction-following training data from Standford Alpaca, and put it under DATA_PATH. Then run:

cd alpaca_finetuning_v1

torchrun --nproc_per_node 8 finetuning.py \
         --model Llama7B_adapter \
         --llama_model_path $TARGET_FOLDER/ \
         --data_path $DATA_PATH/alpaca_data.json \
         --adapter_layer 30 \
         --adapter_len 10 \
         --max_seq_len 512 \
         --batch_size 4 \
         --epochs 5 \
         --warmup_epochs 2 \
         --blr 9e-3 \
         --weight_decay 0.02 \
         --output_dir ./checkpoint/

Citation

If you find our LLaMA-Adapter code and paper useful, please kindly cite:

@article{zhang2023llamaadapter,
  title = {LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention},
  author={Zhang, Renrui and Han, Jiaming and Liu, Chris and Gao, Peng and Zhou, Aojun and Hu, Xiangfei and Yan, Shilin and Lu, Pan and Li, Hongsheng and Qiao, Yu},
  journal={arXiv preprint arXiv:2303.16199},
  year={2023}
}

If you find our LLaMA-Adapter V2 code and paper useful, please kindly cite:

@article{gao2023llamaadapterv2,
  title = {LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model},
  author={Gao, Peng and Han, Jiaming and Zhang, Renrui and Lin, Ziyi and Geng, Shijie and Zhou, Aojun and Zhang, Wei and Lu, Pan and He, Conghui and Yue, Xiangyu and Li, Hongsheng and Qiao, Yu},
  journal={arXiv preprint arXiv:2304.15010},
  year={2023}
}

Acknowledgement

This repo benefits from LLaMA, Stanford Alpaca, and Alpaca-Lora. Thanks for their wonderful works.

About

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

License:GNU General Public License v3.0


Languages

Language:Python 95.1%Language:JavaScript 2.7%Language:Shell 1.5%Language:Rust 0.4%Language:Scheme 0.2%Language:C++ 0.1%