Reproducing sentiment finetuning train_lora extremely slow

Question

Reproducing sentiment finetuning train_lora extremely slow

vikigenius opened this issue 5 months ago · comments

Vikash commented 5 months ago

I am trying to reproduce the finetuning for the fingpt-sentiment_llama2-13b_lora

The table claims we can do this in just a single RTX 3090 within a day.
I am using a L4 GPU instead.

I downloaded the models to base_models and the dataset to data correctly

I used the script like this

deepspeed -i train_lora.py \
--run_name sentiment-llama2-13b-20epoch-64batch \
--base_model llama2-13b-nr \
--dataset sentiment-train \
--max_length 512 \
--batch_size 16 \
--learning_rate 1e-4 \
--num_epochs 20 \

I got an OOM.

So i set the load_in_8_bit=True

But I am getting extremely slow fine tuning speed A single epoch is estimated to take 2 days.

ynjiun · Answer 1 · Tue Feb 13 2024 08:34:22 GMT+0800 (China Standard Time)

two things you might want to consider to speed up:

--base_model llama2-13b-nr => --base_model llama2-7b-nr
use rtx 3090 which is faster than L4 in in memory bandwidth and more cores