CUDA error: out of memory

Question

CUDA error: out of memory

longnhatne opened this issue 3 years ago · comments

Hi guy,
There is an issue CUDA error: out of memory (even with batch size = 1) when I try to run training script with this command
CUDA_VISIBLE_DEVICES=2 python -c "import sys; sys.path.append('./'); from exp.tests.test_cips3d import Testing_ffhq_exp; Testing_ffhq_exp().test_train_ffhq(debug=False)" --tl_opts batch_size 1 img_size 32 total_iters 80000

I try to run on V100 GPU with 32Gb mem. What should I do?
Btw, really appreciate your work, a great paper. 👏

Peterou · Answer 1 · Wed Jan 12 2022 10:58:41 GMT+0800 (China Standard Time)

How about using export CUDA_VISIBLE_DEVICES=2 ?

Long-Nhật nè · Answer 2 · Wed Jan 12 2022 11:00:17 GMT+0800 (China Standard Time)

Still the same :((

Peterou · Answer 3 · Wed Jan 12 2022 11:11:26 GMT+0800 (China Standard Time)

The error seems to be caused by .to(device). Please check whether the torch can use GPU via torch.cuda.is_available().

Long-Nhật nè · Answer 4 · Wed Jan 12 2022 11:16:53 GMT+0800 (China Standard Time)

torch.cuda.is_available() returns True
How much memory does the model take on your machine?
I think it's not that much over 32Gb :((

Peterou · Answer 5 · Wed Jan 12 2022 14:48:05 GMT+0800 (China Standard Time)

I think 32GB is enough to run the program.
How about following the prompt of setting CUDA_LAUNCH_BLOCKING=1?

Long-Nhật nè · Answer 6 · Thu Jan 13 2022 03:15:18 GMT+0800 (China Standard Time)

It seems that there is something wrong with my GPU, I use another one and it works!

Btw, there are many scripts here (ffhq_exp, ffhq_exp_v1...).
What is the difference, and which one should I use?

Peterou · Answer 7 · Thu Jan 13 2022 11:54:10 GMT+0800 (China Standard Time)

It seems that there is something wrong with my GPU, I use another one and it works!

Btw, there are many scripts here (ffhq_exp, ffhq_exp_v1...). What is the difference, and which one should I use?

Hi, I have added running instructions in the readme.