Stability-AI / generative-models

Generative Models by Stability AI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sv3d_u - Out of Memory

Marcophono2 opened this issue · comments

Hello!

However my input image is "prepared" (first attempt with 576x576 with white background) I get an Out of Memory. I have 3x 4090 with 24 GB each and same problem on the default grafic card and also on the third one as the log file shows. The VRAM was fully available for the model and the inference process. (aside from 9 mb). Is this problem known?
I set up a conda environment with Pytorch 3.10.6 and not a Python environment.

Best regards
Marc

CUDA_VISIBLE_DEVICES=2 python scripts/sampling/simple_video_sample.py --input_path /home/marc/Desktop/AI/BLIP2/doom.png --version sv3d_u
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
Initialized embedder #0: FrozenOpenCLIPImagePredictionEmbedder with 683800065 params. Trainable: False
Initialized embedder #1: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False
Initialized embedder #2: ConcatTimestepEmbedderND with 0 params. Trainable: False
Restored from checkpoints/sv3d_u.safetensors with 0 missing and 0 unexpected keys
/home/marc/anaconda3/envs/3D/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
  File "/home/marc/Desktop/AI/generative-models/scripts/sampling/simple_video_sample.py", line 349, in <module>
    Fire(sample)
  File "/home/marc/anaconda3/envs/3D/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/marc/anaconda3/envs/3D/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/marc/anaconda3/envs/3D/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/marc/Desktop/AI/generative-models/scripts/sampling/simple_video_sample.py", line 253, in sample
    samples_x = model.decode_first_stage(samples_z)
  File "/home/marc/anaconda3/envs/3D/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/marc/anaconda3/envs/3D/lib/python3.10/site-packages/sgm/models/diffusion.py", line 130, in decode_first_stage
    out = self.first_stage_model.decode(
  File "/home/marc/anaconda3/envs/3D/lib/python3.10/site-packages/sgm/models/autoencoder.py", line 211, in decode
    x = self.decoder(z, **kwargs)
  File "/home/marc/anaconda3/envs/3D/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/marc/anaconda3/envs/3D/lib/python3.10/site-packages/sgm/modules/diffusionmodules/model.py", line 733, in forward
    h = self.up[i_level].block[i_block](h, temb, **kwargs)
  File "/home/marc/anaconda3/envs/3D/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/marc/anaconda3/envs/3D/lib/python3.10/site-packages/sgm/modules/diffusionmodules/model.py", line 134, in forward
    h = nonlinearity(h)
  File "/home/marc/anaconda3/envs/3D/lib/python3.10/site-packages/sgm/modules/diffusionmodules/model.py", line 49, in nonlinearity
    return x * torch.sigmoid(x)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.21 GiB (GPU 0; 23.65 GiB total capacity; 19.29 GiB already allocated; 105.69 MiB free; 22.04 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Hello! Perhaps you can reduce this value.

decoding_t: int = 14, # Number of frames decoded at a time! This eats most VRAM. Reduce if necessary.

I had to lower it to 6,
RTX 4090, 24 GB

Thank you, guys! Indeed, I meanwhile tried the streamlit version, reduced that value here as well and it worked! Thank you again!

@Marcophono2 remember that 4090s don't have nvlink, so these kinds of memory issues don't depend on the number of 4090s you have

@alexrosen45 Yes, I know. I just wanted to make clear that it cannot have to do with a not fully availably VRAM because the "first" graphic card normally has some default usage for the monitor interface. (as far as it is not a server without monitor). But you are right; my description makes the impression that I am wondering why 3x24GB are not enough.