Cuda out of memory

Question

Cuda out of memory

Vicvickyue opened this issue 3 months ago · comments

Hello! Thank you so much for your amazing work. I'm posting to ask about the cuda out of memory error that I encounter when I'm running the InstanceDiffusion inference demon. I'm using one RTX3050 to run the program and there's no other process using the gpu while I'm running.

XuDong Frank Wang · Answer 1 · Tue Apr 30 2024 01:09:11 GMT+0800 (China Standard Time)

Hi, you may want to use a smaller '--num_images'. Also, please confirm that the flash attention (we use it by default) is used to reduce the memory usage.

raindrop313 · Answer 2 · Thu Jun 06 2024 14:54:37 GMT+0800 (China Standard Time)

I encountered the same issue, and reducing the "--num_images" did not resolve the problem. Based on the error message indicating an "out of memory" error during the model weight loading phase, could you please provide an estimate of how much GPU memory is required to run this project？

@frank-xwang

MilkyCoco · Answer 3 · Sat Jun 08 2024 19:22:54 GMT+0800 (China Standard Time)

Hello, I have met the same problem. I tried reduce the --num_image to 2 or 1, and have confirmed that flash_attn is able to run normally. I ran the demo on RTX4060 with 8GB memory, and I would like to know what GPU memory is needed for training and deployment. @frank-xwang Thanks and looking forward reply.

XuDong Frank Wang · Answer 4 · Tue Jun 11 2024 00:35:00 GMT+0800 (China Standard Time)

Apologies for the delayed response.

Thank you for your interest in InstanceDiffusion. I have made further optimizations to reduce the memory usage of the code. Please update to the latest version by pulling the new InstanceDiffusion code. To run this updated code, you will likely need a GPU with at least 13G of memory. I recently tested it locally on RTX 6000 GPUs, which have 24G of memory, and the inference consumed about 12.8G of memory. For training the model, we utilize A100 GPUs with 80G of memory.

The command I used for model inference:

CUDA_VISIBLE_DEVICES=6 python inference.py \
  --num_images 8 \
  --output OUTPUT/demo/ \
  --input_json demos/demo_cat_dog_robin.json \
  --ckpt pretrained/instancediffusion_sd15.pth \
  --test_config configs/test_box.yaml \
  --guidance_scale 7.5 \
  --alpha 0.75 \
  --seed 4 \
  --mis 0.3 \
  --cascade_strength 0.3

And the memory usage is attached as below:

Hope it helps!