Wired CUDA memory utilization

Question

Wired CUDA memory utilization

lwmlyy opened this issue a year ago · comments

Hi, I am using the python launch to lora-finetune Llama2-70b, the training is doing good. But, it seems a bit wired that the memory utilization is quite low, less than 18G. Also, the training speed is relatively slow compared to the codebase in llama-recipes.

The command is:

The gpu status during training is:

Ariel N. Lee · Answer 1 · Thu Aug 24 2023 21:39:44 GMT+0800 (China Standard Time)

Hi! Thanks for your interest. Have you tried accelerate? That worked for us! The python way also works, but is very slow. Definitely try accelerate, but if you don’t want to I’d at least switch to 4 A100 80gb GPUs.

lwmlyy · Answer 2 · Thu Aug 24 2023 22:45:56 GMT+0800 (China Standard Time)

Hi, thanks for your reply. Is there any script I can refer to if I want to try accelerate? Also, do you mean that the python mode runs faster with 4*a100 than 8*a100?---- Replied Message ----FromAriel N. ***@***.***>Date08/24/2023 21:39 ***@***.***> ***@***.***>***@***.***>SubjectRe: [arielnlee/Platypus] Wired CUDA memory utilization (Issue #16) Hi! Thanks for your interest. Have you tried accelerate? That worked for us! The python way also works, but is very slow. Definitely try accelerate, but if you don’t want to I’d at least switch to 4 A100 80gb GPUs. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***> [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#16 (comment)", "url": "#16 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Ariel N. Lee · Answer 3 · Thu Aug 24 2023 22:53:25 GMT+0800 (China Standard Time)

First run accelerate config to set up accelerate and then replace python finetune.py with accelerate launch finetune.py. If that doesn't work, I'll be happy to get you a script.

To clarify, running python finetune.py will not run as quickly on 4 vs 8 GPUs but when we tried it the native python way, 8 GPUS seemed a bit of a waste, since, as you noticed, utilization isn't great.

lwmlyy · Answer 4 · Fri Aug 25 2023 10:18:06 GMT+0800 (China Standard Time)

First run accelerate config to set up accelerate and then replace python finetune.py with accelerate launch finetune.py. If that doesn't work, I'll be happy to get you a script.

To clarify, running python finetune.py will not run as quickly on 4 vs 8 GPUs but when we tried it the native python way, 8 GPUS seemed a bit of a waste, since, as you noticed, utilization isn't great.

I just tried running the script with accelerate launch(8*a100-80gb), but it went CUDA OOM during model loading. Any advice?

The accelerate config is as follow:

The launch config is as follow:

fallnirvana · Answer 5 · Sun Oct 08 2023 12:28:05 GMT+0800 (China Standard Time)

same problem. I solve this by reinstall the python package with the version in requirement.txt，i think this is relate with the peft package.
but after that still CUDA memory when the cutoff_len is bigger than 1024