Won't generate Depthmap video with 'dpt_beit_large_512'

Question

Won't generate Depthmap video with 'dpt_beit_large_512'

eyeEmotion opened this issue 3 months ago · comments

Hi,

I'm converting a test-video with several models, with and without Boost on. Most of the models work, although some I couldn't test out because it took too long.
But when I select DPT_BEIT_LARGE_512 (still have to test if the 384 has the same problem), I first get some warnings, then it keeps on creating depthmaps per frame, When that's done, it start generating the the output, but fails at it.

Edit: Tried the 384 and that one works fine. Tried the 512 again with a different video-file, that was also bigger/longer. But again, at 13%, got the same warnings.

I'm using the deptmap within Stable Diffusion (Automatic1111 or what is it called)
Here is what is outputted in the commandline:

To create a public link, set share=True in launch().
Startup time: 28.1s (prepare environment: 9.4s, import torch: 5.2s, import gradio: 3.4s, setup paths: 3.2s, initialize shared: 0.4s, other imports: 2.2s, setup codeformer: 0.3s, load scripts: 3.6s, create ui: 0.2s, gradio launch: 0.4s).
Creating model from config: D:\Documenten\stable-diffusion-webui\configs\v1-inference.yaml
Applying attention optimization: Doggettx... done.
Model loaded in 47.1s (load weights from disk: 36.7s, create model: 0.2s, apply weights to model: 1.2s, apply half(): 1.9s, load textual inversion embeddings: 0.2s, calculate empty prompt: 6.8s).
Generating depthmaps for the video frames
DepthMap v0.4.6 (500ee72)
device: cuda
Loading model(s) ..
Loading model weights from ./models/midas/dpt_beit_large_512.pt
Computing output(s) ..
13%|█████████▉ | 222/1757 [03:24<23:09, 1.10it/s]WARNING:py.warnings:D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py:196: RuntimeWarning: invalid value encountered in subtract
out = (out - out.min()) / (out.max() - out.min()) # normalize to [0; 1]

WARNING:py.warnings:D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py:196: RuntimeWarning: invalid value encountered in divide
out = (out - out.min()) / (out.max() - out.min()) # normalize to [0; 1]

100%|██████████████████████████████████████████████████████████████████████████████| 1757/1757 [26:14<00:00, 1.12it/s]
Computing output(s) done.
All done.

Processing generated depthmaps
Generating output frames
DepthMap v0.4.6 (500ee72)
device: cuda
Computing output(s) ..
99%|█████████████████████████████████████████████████████████████████████████████ | 1737/1757 [00:23<00:00, 73.84it/s]
Fail.

Traceback (most recent call last):
File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\common_ui.py", line 457, in run_generate
ret = video_mode.gen_video(
File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 159, in gen_video
img_results = list(core.core_generation_funnel(None, input_images, input_depths, None, inp))
File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py", line 322, in core_generation_funnel
raise e
File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py", line 139, in core_generation_funnel
if inputdepthmaps is not None and inputdepthmaps[count] is not None:
IndexError: list index out of range

My hardware is up to the snuff, cause I even can use BOOST without my computer breaking a sweat (it just takes a long time).
Got an i7-13700K, 2x16GB 3600 DDR4 RAM with an Nvidia RTX 3060 OC 12GB. (Btw, is there a setting where I can dedicate more VRAM to it. It seems to mostly run around 5GB RAM, sometimes 7GB RAM.

Also, how can I add other models to the dropdown list? For example, I also want to try 'dpt_swin_large_384.pt' and 'dpt_swin2_large_384.pt'.

eyeEmotion · Answer 1 · Mon Mar 04 2024 07:46:38 GMT+0800 (China Standard Time)

Manually cut and rendered my video file again. This time the Depth generation was able to get through the first process all the way through.
But now it gave an error with the "Generating output frames" section. It was almost at the last frames, when I suddenly get this error:

Startup time: 8.1s (prepare environment: 1.9s, import torch: 2.2s, import gradio: 0.8s, setup paths: 0.7s, initialize shared: 0.2s, other imports: 0.4s, load scripts: 1.4s, create ui: 0.2s, gradio launch: 0.2s).
Creating model from config: C:\AI\stable-diffusion-webui\configs\v1-inference.yaml
Applying attention optimization: Doggettx... done.
Model loaded in 6.2s (load weights from disk: 1.9s, create model: 0.2s, apply weights to model: 1.2s, apply half(): 1.3s, calculate empty prompt: 1.5s).
Generating depthmaps for the video frames
DepthMap v0.4.6 (500ee72)
device: cuda
Loading model(s) ..
Loading model weights from ./models/midas/dpt_beit_large_512.pt
Computing output(s) ..
100%|████████████████████████████████████████████████████████████████████████████| 4320/4320 [1:06:43<00:00, 1.08it/s]
Computing output(s) done.
All done.

Processing generated depthmaps
Generating output frames
DepthMap v0.4.6 (500ee72)
device: cuda
Computing output(s) ..
99%|█████████████████████████████████████████████████████████████████████████████▎| 4283/4320 [04:36<00:02, 15.47it/s]
Fail.

Traceback (most recent call last):
File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\common_ui.py", line 457, in run_generate
ret = video_mode.gen_video(
File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 159, in gen_video
img_results = list(core.core_generation_funnel(None, input_images, input_depths, None, inp))
File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py", line 322, in core_generation_funnel
raise e
File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py", line 139, in core_generation_funnel
if inputdepthmaps is not None and inputdepthmaps[count] is not None:
IndexError: list index out of range

It was a 3 minute video. Nothing seemed to go wrong at any hardware level, when watching in the Task Manager. Didn't have any trouble with space on my RAM or Virtual Memory.

eyeEmotion · Answer 2 · Mon Mar 04 2024 14:24:03 GMT+0800 (China Standard Time)

Tried it again today, this time with a 2 minute and 30 seconds video file.
Now I got this error again:

DepthMap v0.4.6 (500ee72)
device: cuda
Loading model(s) ..
Loading model weights from ./models/midas/dpt_beit_large_512.pt
Computing output(s) ..
30%|███████████████████████▍ | 1082/3600 [16:09<36:27, 1.15it/s]WARNING:py.warnings:C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py:196: RuntimeWarning: invalid value encountered in subtract
out = (out - out.min()) / (out.max() - out.min()) # normalize to [0; 1]

WARNING:py.warnings:C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py:196: RuntimeWarning: invalid value encountered in divide
out = (out - out.min()) / (out.max() - out.min()) # normalize to [0; 1]

WARNING:py.warnings:C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py:45: RuntimeWarning: invalid value encountered in cast
return out.astype("uint16")

100%|██████████████████████████████████████████████████████████████████████████████| 3600/3600 [53:35<00:00, 1.12it/s]
Computing output(s) done.
All done.

Processing generated depthmaps
Generating output frames
DepthMap v0.4.6 (500ee72)
device: cuda
Computing output(s) ..
99%|█████████████████████████████████████████████████████████████████████████████▎| 3569/3600 [03:25<00:01, 17.33it/s]
Fail.

Traceback (most recent call last):
File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\common_ui.py", line 457, in run_generate
ret = video_mode.gen_video(
File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 159, in gen_video
img_results = list(core.core_generation_funnel(None, input_images, input_depths, None, inp))
File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py", line 322, in core_generation_funnel
raise e
File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py", line 139, in core_generation_funnel
if inputdepthmaps is not None and inputdepthmaps[count] is not None:
IndexError: list index out of range

It's seems to always go wrong somewhere, when usig 'dpt_beit_large_512'.

Semjon Kravtšenko · Answer 3 · Thu Mar 07 2024 07:06:40 GMT+0800 (China Standard Time)

This issue makes my heart bleed... And I probably won't have time in the near future to fix this. Double downer. You mention in the other issue that you managed to kinda make it work, but the experience was nowhere near seamless, whereas it would be nice if it was.

Btw, you could try the Depth Anything model - it works great and does not require BOOST (it is better to disable BOOST because it is resource and VRAM hungry).

eyeEmotion · Answer 4 · Tue Mar 12 2024 05:18:46 GMT+0800 (China Standard Time)

Hi,

I'm indeed currently using the Depth Anything model, after someone at forum for 2d-to-3d movie conversion suggested it.
I didn't use that one at first, because that one didn't work with me either. But then I discovered, I also had to install Controlnet and add arguments to the launch .bat in order for it to work.

It's just slightly slower than Midas V3.1 BEIT_L_384, but it is so much more accurate, the right sort of details and a lot more stable.
Downside is, I had to divide the movie into even smaller parts for it to work: 2 minute clips instead of 3 minutes.
Midas V3.1 BEIT_L_384 takes around 30 minutes for a 3 minute clip, while Depth Anything takes around 40 minutes for a 2 minute clip.

Does Midas V3.1 BEIT_L_512 differ much from BEIT_L_384?

Cheers