Reproducing sentiment finetuning train_lora extremely slow
vikigenius opened this issue · comments
Vikash commented
I am trying to reproduce the finetuning for the fingpt-sentiment_llama2-13b_lora
The table claims we can do this in just a single RTX 3090 within a day.
I am using a L4 GPU instead.
I downloaded the models to base_models and the dataset to data correctly
I used the script like this
deepspeed -i train_lora.py \
--run_name sentiment-llama2-13b-20epoch-64batch \
--base_model llama2-13b-nr \
--dataset sentiment-train \
--max_length 512 \
--batch_size 16 \
--learning_rate 1e-4 \
--num_epochs 20 \
I got an OOM.
So i set the load_in_8_bit=True
But I am getting extremely slow fine tuning speed A single epoch is estimated to take 2 days.
ynjiun commented
two things you might want to consider to speed up:
- --base_model llama2-13b-nr => --base_model llama2-7b-nr
- use rtx 3090 which is faster than L4 in in memory bandwidth and more cores