ZhengxiangShi / DePT

[ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning"

Home Page:http://arxiv.org/abs/2309.05173

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LLaMA 2 finetuning

Ahmed-Roushdy opened this issue · comments

Thanks for the good work and clear/neat implementation. I have some questions regarding the implementation of LLaMA 2

  • Q.1 I can see in train.py file that you support LLaMA 2 model, however, the trainer in line 639 in train.py generates an object from the PEFTSeq2SeqTrainer() class. Later in line 659, you call the method.train. When I checked the implementation of the class PEFTSeq2SeqTrainer() it seems it does not have the train method. does this mean for LLaMA (deocder based model) you do not support finetuning of LLaMA model, or am missing something ?.

  • Q.2 Also in the same class in line 75, generation_inputs = inputs[self.model.main_input_name] is called. So my questions is does LLaMA2 has main_input_name?

  • Q.3 is the code runnable for LLaM2 on the all datasets provided, as I can see in the paper you have the results only for sst2 dataset?

Thanks and looking for your response.

Thanks for your question and sorry for my delayed response.

Regarding Q1, PEFTSeq2SeqTrainer inherits Trainer Class, which has a train method.

Regarding Q2, I am not sure about this.

Regarding Q3, does this work for you?

MODEL=llama-2-7b
MAX_LENGTH=64
MAX_STEPS=50000
PREFIX_LENGTH=40 
R=60
for TASK_NAME in sst2; do
  for LORA_LR in 5e-3 3e-1 5e-4; do
      for lr in 3e-1 4e-1; do
            python train.py \
                --peft_type PROMPT_TUNING_LORA \
                --lora_embedding_lr ${LORA_LR} \
                --learning_rate ${lr} \
                --prefix_length ${PREFIX_LENGTH} \
                --r ${R} \
                --task_name ${TASK_NAME} \
                --dataset_config_name en \
                --model_name_or_path your_path/${MODEL} \
                --do_train \
                --do_eval \
                --do_predict \
                --per_device_train_batch_size 32 \
                --per_device_eval_batch_size 32 \
                --max_seq_length ${MAX_LENGTH} \
                --save_strategy steps \
                --evaluation_strategy steps \
                --max_steps ${MAX_STEPS} \
                --eval_steps 1000 \
                --save_steps 1000 \
                --warmup_steps 500 \
                --weight_decay 1e-5 \
                --load_best_model_at_end \
                --save_total_limit 1 \
                --output_dir saved_${MODEL}/${TASK_NAME}_lr${lr}_loralr${LORA_LR}_pl${PREFIX_LENGTH}_r${R}_st${MAX_STEPS};
        done;
    done;
done

Thank you for the help. I will try it and let you know.