artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs

Home Page:https://arxiv.org/abs/2305.14314

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

multi gpu uneven VRAM utilization

ehartford opened this issue · comments

hello
when I train with multi gpu like this

WORLD_SIZE=8 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 qlora.py \

Then I get uneven VRAM utilization:

2023-08-08_11-27-38

This means, that I have to use a smaller batch size than I otherwise could, which causes my build to take 30% longer than it should.

I don't have this problem when doing multi-gpu build in full-weights (non-qlora) using accelerate or deepspeed.

What model / other parameters are you using for torch run? I personally try to stay away from torchrun and use accelerate instead.

I'm having good success using this fork: https://github.com/ChrisHayduk/qlora-multi-gpu/

Models have about even vram usage 40.97GB v 40.95GB

image