Question: CUDA memory usage in the evaluation phase
LimboWK opened this issue · comments
I have a customized SFT and evaluation scripts using QLora but I got GPU memory not enough problem in the evaluation steps, does anyone have the same issue or any insights on how to reduce the usage in the eval steps.
the trainer and dataset looks like this:
#######################################################################
gradient_accumulation_steps = 4
per_device_train_batch_size = 4
per_device_eval_batch_size = 1
total_train_samples = len(train_data)
total_validation_samples = len(validation_data)
print("*** Total training samples:", total_train_samples)
print("*** Total validation samples:", total_validation_samples)
num_train_steps_per_epoch = (total_train_samples // per_device_train_batch_size // gradient_accumulation_steps)
print('*** num_train_steps_per_epoch: ', num_train_steps_per_epoch)
num_train_epochs = 1
max_steps = int(num_train_epochs * num_train_steps_per_epoch)
print('*** Max steps:', max_steps)
trainer
trainer = transformers.Trainer(
model=model,
train_dataset=train_data,
eval_dataset=validation_data,
compute_metrics=compute_bleu_score,
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
args=transformers.TrainingArguments(
per_device_train_batch_size=per_device_train_batch_size,
per_device_eval_batch_size=per_device_eval_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
warmup_steps=2,
max_steps=max_steps,
learning_rate=1e-4,
evaluation_strategy="steps",
eval_steps=50,
save_steps=50,
logging_steps=10,
save_total_limit=2,
fp16=True,
output_dir="outputs",
optim="paged_adamw_8bit"
),
)
model.config.use_cache = False
per_device_train_batch_size = 1
I also encountered this problem. Did you solve it later?