Question: CUDA memory usage in the evaluation phase

Question

Question: CUDA memory usage in the evaluation phase

LimboWK opened this issue 9 months ago · comments

I have a customized SFT and evaluation scripts using QLora but I got GPU memory not enough problem in the evaluation steps, does anyone have the same issue or any insights on how to reduce the usage in the eval steps.

the trainer and dataset looks like this:

#######################################################################
gradient_accumulation_steps = 4
per_device_train_batch_size = 4
per_device_eval_batch_size = 1
total_train_samples = len(train_data)
total_validation_samples = len(validation_data)
print("*** Total training samples:", total_train_samples)
print("*** Total validation samples:", total_validation_samples)

num_train_steps_per_epoch = (total_train_samples // per_device_train_batch_size // gradient_accumulation_steps)
print('*** num_train_steps_per_epoch: ', num_train_steps_per_epoch)
num_train_epochs = 1
max_steps = int(num_train_epochs * num_train_steps_per_epoch)
print('*** Max steps:', max_steps)

trainer

trainer = transformers.Trainer(
model=model,
train_dataset=train_data,
eval_dataset=validation_data,
compute_metrics=compute_bleu_score,
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
args=transformers.TrainingArguments(
per_device_train_batch_size=per_device_train_batch_size,
per_device_eval_batch_size=per_device_eval_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
warmup_steps=2,
max_steps=max_steps,
learning_rate=1e-4,
evaluation_strategy="steps",
eval_steps=50,
save_steps=50,
logging_steps=10,
save_total_limit=2,
fp16=True,
output_dir="outputs",
optim="paged_adamw_8bit"
),
)
model.config.use_cache = False

Jonny Borges · Answer 1 · Thu Sep 14 2023 06:52:32 GMT+0800 (China Standard Time)

per_device_train_batch_size = 1

Mengzhao Chen · Answer 2 · Tue Jan 02 2024 18:59:50 GMT+0800 (China Standard Time)

I also encountered this problem. Did you solve it later?