[Bug] large CUDA memory usage in the evaluation phase

Question

[Bug] large CUDA memory usage in the evaluation phase

ChenMnZ opened this issue 5 months ago · comments

I train llama-7b with the following batch size settings:

    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 4 \

When training, it consumes about 9G GPU memory. However, when evaluation (mmlu evaluation), the memory consumption increase to 27GB. It is there any bug for the evaluation process?

TIANSHU ZHU · Answer 1 · Tue Jan 30 2024 07:18:35 GMT+0800 (China Standard Time)

set --eval_accumulation_steps