TMElyralab / MusePose

MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pls help me

xy-soft opened this issue · comments

I have finished the 1st step: python pose_align.py --imgfn_refer ./assets/images/ref.png --vidfn ./assets/videos/dance.mp4
and something wrong at the step:

Moviepy - Done !
Moviepy - video ready ./assets/poses/align/img_ref_video_dance.mp4
pose align done
(musePose) [root@localhost MusePose]# python test_stage_2.py --config ./configs/test_stage_2.yaml
Width: 768
Height: 768
Length: 300
Slice: 48
Overlap: 4
Classifier free guidance: 3.5
DDIM sampling steps : 20
skip 1
Traceback (most recent call last):
File "/home/wangxin/MusePose/test_stage_2.py", line 238, in
main()
File "/home/wangxin/MusePose/test_stage_2.py", line 76, in main
vae = AutoencoderKL.from_pretrained(
File "/data/glm3/anaconda3/envs/musePose/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 805, in from_pretrained
raise ValueError(
ValueError: Cannot load <class 'diffusers.models.autoencoder_kl.AutoencoderKL'> from ./pretrained_weights/sd-vae-ft-mse because the following keys are missing:
decoder.mid_block.attentions.0.to_q.weight, decoder.up_blocks.3.resnets.0.conv2.bias, encoder.down_blocks.2.resnets.1.conv2.bias, decoder.up_blocks.0.resnets.2.conv2.weight, encoder.down_blocks.0.resnets.0.conv2.bias, decoder.up_blocks.1.resnets.0.conv2.bias, encoder.conv_in.weight, decoder.up_blocks.1.resnets.2.norm1.weight, decoder.up_blocks.3.resnets.1.norm1.weight, decoder.up_blocks.3.resnets.1.conv2.bias, decoder.up_blocks.2.resnets.0.conv1.weight, decoder.up_blocks.1.resnets.2.conv1.weight, encoder.down_blocks.1.downsamplers.0.conv.weight, encoder.down_blocks.3.resnets.0.norm2.weight, encoder.mid_block.resnets.0.norm1.bias, decoder.up_blocks.2.resnets.0.norm1.bias, decoder.mid_block.resnets.0.norm2.bias, decoder.up_blocks.3.resnets.0.conv_shortcut.bias, encoder.down_blocks.3.resnets.0.conv1.bias, decoder.up_blocks.2.resnets.1.norm1.weight, decoder.up_blocks.0.resnets.0.norm2.weight, encoder.down_blocks.1.resnets.0.norm1.bias, decoder.up_blocks.2.resnets.2.norm2.weight, quant_conv.weight, decoder.up_blocks.3.resnets.0.conv_shortcut.weight, decoder.up_blocks.0.resnets.2.norm2.weight, decoder.up_blocks.0.resnets.0.norm1.bias, encoder.down_blocks.1.resnets.1.conv1.weight, encoder.down_blocks.1.resnets.0.conv_shortcut.weight, encoder.down_blocks.0.resnets.1.norm1.weight, encoder.down_blocks.3.resnets.1.conv1.bias, encoder.down_blocks.1.resnets.0.conv2.bias, encoder.mid_block.resnets.0.conv2.bias, decoder.mid_block.attentions.0.group_norm.bias, encoder.down_blocks.3.resnets.1.conv2.bias, encoder.down_blocks.2.downsamplers.0.conv.weight, encoder.mid_block.resnets.1.conv2.weight, encoder.down_blocks.0.resnets.1.norm1.bias, encoder.mid_block.attentions.0.to_q.weight, decoder.up_blocks.1.resnets.1.norm2.weight, decoder.up_blocks.1.upsamplers.0.conv.bias, encoder.down_blocks.3.resnets.0.norm2.bias, decoder.up_blocks.3.resnets.1.norm2.bias, encoder.mid_block.attentions.0.to_out.0.weight, decoder.up_blocks.1.resnets.2.norm1.bias, encoder.down_blocks.2.resnets.1.norm2.weight, decoder.up_blocks.0.resnets.2.conv2.bias, decoder.mid_block.resnets.0.conv2.weight, encoder.down_blocks.1.resnets.0.conv_shortcut.bias, decoder.conv_norm_out.weight, decoder.mid_block.resnets.1.norm1.bias, encoder.mid_block.resnets.0.norm1.weight, encoder.down_blocks.2.resnets.0.norm2.bias, decoder.mid_block.attentions.0.to_k.bias, encoder.down_blocks.1.resnets.1.norm1.bias, encoder.mid_block.attentions.0.to_k.bias, encoder.conv_norm_out.weight, decoder.up_blocks.0.resnets.1.conv1.bias, decoder.up_blocks.0.resnets.2.conv1.weight, decoder.up_blocks.3.resnets.0.conv1.bias, encoder.down_blocks.2.resnets.0.conv1.weight, encoder.down_blocks.1.resnets.0.conv1.bias, encoder.conv_in.bias, decoder.up_blocks.3.resnets.2.norm1.weight, encoder.down_blocks.3.resnets.1.norm2.weight, decoder.mid_block.resnets.1.conv1.bias, decoder.up_blocks.2.resnets.0.conv_shortcut.bias, decoder.conv_in.weight, decoder.up_blocks.2.resnets.2.conv1.weight, decoder.up_blocks.3.resnets.1.norm1.bias, decoder.up_blocks.1.resnets.2.norm2.bias, decoder.mid_block.attentions.0.to_out.0.weight, encoder.down_blocks.0.resnets.1.norm2.bias, decoder.up_blocks.3.resnets.1.conv1.weight, encoder.down_blocks.3.resnets.0.norm1.weight, encoder.conv_norm_out.bias, encoder.down_blocks.0.resnets.0.norm2.weight, encoder.mid_block.resnets.0.conv1.weight, encoder.mid_block.resnets.0.conv2.weight, decoder.conv_in.bias, decoder.up_blocks.0.resnets.2.norm1.bias, encoder.down_blocks.0.resnets.0.norm1.bias, decoder.up_blocks.1.resnets.1.conv1.weight, decoder.mid_block.attentions.0.to_out.0.bias, encoder.down_blocks.0.resnets.0.conv1.bias, decoder.up_blocks.2.resnets.2.norm1.weight, decoder.mid_block.resnets.0.norm1.bias, encoder.down_blocks.0.resnets.1.conv1.bias, decoder.up_blocks.0.resnets.0.conv2.bias, decoder.up_blocks.3.resnets.2.conv1.bias, decoder.up_blocks.1.resnets.0.conv1.bias, decoder.up_blocks.2.resnets.0.norm2.weight, decoder.up_blocks.2.resnets.1.conv2.weight, decoder.up_blocks.1.resnets.1.conv1.bias, encoder.down_blocks.2.resnets.1.conv1.weight, encoder.down_blocks.0.downsamplers.0.conv.weight, encoder.down_blocks.1.resnets.0.conv2.weight, decoder.up_blocks.1.resnets.2.conv1.bias, decoder.mid_block.resnets.1.norm2.bias, encoder.mid_block.attentions.0.to_q.bias, decoder.mid_block.resnets.1.conv2.bias, encoder.down_blocks.2.resnets.1.norm1.bias, encoder.mid_block.attentions.0.group_norm.weight, encoder.down_blocks.2.resnets.0.norm1.weight, encoder.mid_block.resnets.1.norm2.bias, decoder.conv_out.bias, encoder.down_blocks.0.resnets.1.conv2.weight, encoder.down_blocks.1.resnets.1.conv2.weight, decoder.up_blocks.2.resnets.0.conv1.bias, decoder.up_blocks.3.resnets.2.conv2.weight, decoder.up_blocks.0.upsamplers.0.conv.bias, decoder.up_blocks.0.upsamplers.0.conv.weight, decoder.up_blocks.3.resnets.0.conv2.weight, decoder.up_blocks.2.resnets.0.conv_shortcut.weight, decoder.up_blocks.0.resnets.1.conv1.weight, decoder.up_blocks.0.resnets.2.norm1.weight, decoder.up_blocks.1.resnets.0.norm2.bias, encoder.down_blocks.3.resnets.1.norm2.bias, encoder.down_blocks.1.resnets.1.conv2.bias, decoder.up_blocks.2.resnets.2.norm1.bias, decoder.up_blocks.1.resnets.2.conv2.weight, decoder.up_blocks.2.resnets.0.norm2.bias, encoder.down_blocks.2.resnets.1.conv2.weight, encoder.down_blocks.0.resnets.1.conv1.weight, encoder.down_blocks.1.resnets.0.norm2.bias, encoder.mid_block.resnets.0.norm2.weight, decoder.up_blocks.1.resnets.0.norm1.bias, decoder.up_blocks.2.resnets.1.norm1.bias, decoder.mid_block.attentions.0.to_k.weight, decoder.up_blocks.0.resnets.1.conv2.bias, decoder.up_blocks.2.upsamplers.0.conv.bias, quant_conv.bias, decoder.up_blocks.3.resnets.0.norm2.weight, decoder.up_blocks.1.resnets.1.conv2.weight, encoder.mid_block.resnets.1.norm1.weight, encoder.down_blocks.2.resnets.1.norm2.bias, decoder.mid_block.resnets.0.conv1.bias, decoder.up_blocks.3.resnets.2.norm2.bias, encoder.down_blocks.0.downsamplers.0.conv.bias, decoder.up_blocks.0.resnets.2.conv1.bias, decoder.up_blocks.3.resnets.0.norm1.weight, encoder.down_blocks.2.resnets.0.norm1.bias, encoder.down_blocks.2.resnets.0.conv1.bias, decoder.up_blocks.2.resnets.1.norm2.weight, decoder.up_blocks.1.resnets.2.norm2.weight, decoder.mid_block.resnets.0.conv1.weight, decoder.mid_block.attentions.0.to_v.bias, decoder.up_blocks.2.resnets.2.conv1.bias, encoder.down_blocks.3.resnets.1.conv2.weight, encoder.down_blocks.3.resnets.1.norm1.bias, encoder.mid_block.attentions.0.to_k.weight, decoder.mid_block.resnets.0.conv2.bias, decoder.up_blocks.1.resnets.2.conv2.bias, decoder.mid_block.resnets.0.norm1.weight, decoder.mid_block.attentions.0.to_v.weight, encoder.mid_block.resnets.1.norm1.bias, decoder.conv_out.weight, encoder.down_blocks.1.resnets.1.conv1.bias, decoder.up_blocks.3.resnets.2.norm1.bias, decoder.mid_block.resnets.1.conv2.weight, encoder.down_blocks.0.resnets.0.conv2.weight, decoder.up_blocks.3.resnets.0.norm1.bias, encoder.down_blocks.2.resnets.0.conv_shortcut.weight, decoder.up_blocks.2.resnets.0.conv2.bias, decoder.up_blocks.2.resnets.1.conv2.bias, encoder.mid_block.resnets.1.conv1.weight, encoder.down_blocks.0.resnets.1.conv2.bias, encoder.down_blocks.3.resnets.0.norm1.bias, encoder.mid_block.attentions.0.group_norm.bias, encoder.mid_block.attentions.0.to_v.weight, encoder.down_blocks.1.resnets.1.norm2.bias, decoder.up_blocks.1.resnets.1.conv2.bias, encoder.mid_block.resnets.1.norm2.weight, encoder.mid_block.resnets.0.conv1.bias, decoder.up_blocks.2.resnets.1.norm2.bias, decoder.mid_block.resnets.1.norm2.weight, decoder.mid_block.attentions.0.group_norm.weight, decoder.up_blocks.2.resnets.1.conv1.weight, post_quant_conv.weight, decoder.up_blocks.2.resnets.0.norm1.weight, encoder.down_blocks.1.resnets.1.norm1.weight, encoder.mid_block.resnets.1.conv2.bias, decoder.up_blocks.0.resnets.1.conv2.weight, encoder.mid_block.attentions.0.to_out.0.bias, decoder.up_blocks.3.resnets.0.conv1.weight, decoder.up_blocks.0.resnets.1.norm1.bias, decoder.up_blocks.1.resnets.1.norm1.weight, decoder.up_blocks.3.resnets.1.conv1.bias, decoder.mid_block.resnets.1.norm1.weight, encoder.mid_block.resnets.1.conv1.bias, decoder.up_blocks.0.resnets.1.norm1.weight, encoder.down_blocks.2.downsamplers.0.conv.bias, decoder.up_blocks.2.resnets.2.conv2.weight, encoder.down_blocks.2.resnets.1.norm1.weight, decoder.up_blocks.1.resnets.0.norm2.weight, decoder.up_blocks.0.resnets.0.conv2.weight, encoder.down_blocks.1.resnets.0.conv1.weight, decoder.up_blocks.0.resnets.0.conv1.bias, encoder.down_blocks.1.downsamplers.0.conv.bias, decoder.up_blocks.0.resnets.1.norm2.weight, encoder.down_blocks.0.resnets.0.conv1.weight, decoder.up_blocks.2.resnets.0.conv2.weight, decoder.mid_block.resnets.1.conv1.weight, encoder.down_blocks.2.resnets.1.conv1.bias, encoder.down_blocks.0.resnets.1.norm2.weight, decoder.up_blocks.3.resnets.2.conv1.weight, encoder.down_blocks.2.resnets.0.norm2.weight, encoder.down_blocks.1.resnets.0.norm2.weight, encoder.down_blocks.3.resnets.1.conv1.weight, encoder.mid_block.resnets.0.norm2.bias, decoder.up_blocks.1.resnets.0.conv1.weight, encoder.down_blocks.2.resnets.0.conv_shortcut.bias, decoder.up_blocks.3.resnets.2.conv2.bias, encoder.down_blocks.3.resnets.0.conv2.weight, post_quant_conv.bias, encoder.down_blocks.2.resnets.0.conv2.bias, encoder.down_blocks.3.resnets.0.conv1.weight, encoder.conv_out.bias, decoder.up_blocks.0.resnets.0.conv1.weight, decoder.up_blocks.1.resnets.0.conv2.weight, decoder.up_blocks.2.resnets.2.conv2.bias, encoder.down_blocks.0.resnets.0.norm2.bias, decoder.conv_norm_out.bias, decoder.up_blocks.1.resnets.1.norm1.bias, encoder.down_blocks.2.resnets.0.conv2.weight, encoder.conv_out.weight, decoder.up_blocks.1.upsamplers.0.conv.weight, decoder.up_blocks.0.resnets.1.norm2.bias, decoder.up_blocks.1.resnets.1.norm2.bias, decoder.up_blocks.3.resnets.0.norm2.bias, encoder.down_blocks.1.resnets.1.norm2.weight, decoder.up_blocks.1.resnets.0.norm1.weight, decoder.up_blocks.2.resnets.2.norm2.bias, decoder.up_blocks.3.resnets.2.norm2.weight, decoder.up_blocks.0.resnets.0.norm2.bias, encoder.mid_block.attentions.0.to_v.bias, encoder.down_blocks.3.resnets.1.norm1.weight, decoder.up_blocks.2.upsamplers.0.conv.weight, decoder.up_blocks.2.resnets.1.conv1.bias, decoder.up_blocks.3.resnets.1.conv2.weight, encoder.down_blocks.0.resnets.0.norm1.weight, encoder.down_blocks.1.resnets.0.norm1.weight, decoder.mid_block.resnets.0.norm2.weight, decoder.up_blocks.0.resnets.2.norm2.bias, encoder.down_blocks.3.resnets.0.conv2.bias, decoder.mid_block.attentions.0.to_q.bias, decoder.up_blocks.3.resnets.1.norm2.weight, decoder.up_blocks.0.resnets.0.norm1.weight.
Please make sure to pass low_cpu_mem_usage=False and device_map=None if you want to randomly initialize those weights or else make sure your checkpoint file is correct.

image
yours sd-vae-ft-mse Models do not match. Check whether all downloaded models are correct and whether the model path of test_satge_2.YAML file is correct

修改代码可以解决:
vae = AutoencoderKL.from_pretrained(
config.pretrained_vae_path,low_cpu_mem_usage=False,
).to("cuda", dtype=weight_dtype)