multi gpu uneven VRAM utilization

Question

ehartford opened this issue 10 months ago · comments

hello
when I train with multi gpu like this

WORLD_SIZE=8 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 qlora.py \

Then I get uneven VRAM utilization:

This means, that I have to use a smaller batch size than I otherwise could, which causes my build to take 30% longer than it should.

I don't have this problem when doing multi-gpu build in full-weights (non-qlora) using accelerate or deepspeed.

NICHOLAI MITCHKO · Answer 1 · Fri Sep 01 2023 08:50:54 GMT+0800 (China Standard Time)

What model / other parameters are you using for torch run? I personally try to stay away from torchrun and use accelerate instead.

Models have about even vram usage 40.97GB v 40.95GB