Some weights of AutoencoderKL were not initialized from the model checkpoint at /path/to/Latte/t2v_required_models/ and are newly initialized because the shapes did not match:

Question

Some weights of AutoencoderKL were not initialized from the model checkpoint at /path/to/Latte/t2v_required_models/ and are newly initialized because the shapes did not match:

likeatingcake opened this issue 3 months ago · comments

decoder.conv_in.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.conv_in.weight: found shape torch.Size([512, 4, 3, 3]) in the checkpoint and torch.Size([64, 4, 3, 3]) in the model instantiated
decoder.conv_norm_out.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.conv_norm_out.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.conv_out.weight: found shape torch.Size([3, 128, 3, 3]) in the checkpoint and torch.Size([3, 64, 3, 3]) in the model instantiated
decoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
decoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
decoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
decoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
decoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.up_blocks.0.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.up_blocks.0.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.up_blocks.0.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.up_blocks.0.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.conv_in.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.conv_in.weight: found shape torch.Size([128, 3, 3, 3]) in the checkpoint and torch.Size([64, 3, 3, 3]) in the model instantiated
encoder.conv_norm_out.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.conv_norm_out.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.conv_out.weight: found shape torch.Size([8, 512, 3, 3]) in the checkpoint and torch.Size([8, 64, 3, 3]) in the model instantiated
encoder.down_blocks.0.resnets.0.conv1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.down_blocks.0.resnets.0.conv1.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
encoder.down_blocks.0.resnets.0.conv2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.down_blocks.0.resnets.0.conv2.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
encoder.down_blocks.0.resnets.0.norm1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.down_blocks.0.resnets.0.norm1.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.down_blocks.0.resnets.0.norm2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.down_blocks.0.resnets.0.norm2.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
encoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
encoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
encoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
encoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
encoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
encoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
encoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
encoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

当我执行命令bash sample/t2v.sh ,出现预训练模型与实际模型形状不匹配的情况，请问这个问题该如何解决呀？谢谢您！

Xin Ma · Answer 1 · Thu Mar 28 2024 06:58:30 GMT+0800 (China Standard Time)

decoder.conv_in.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.conv_in.weight: found shape torch.Size([512, 4, 3, 3]) in the checkpoint and torch.Size([64, 4, 3, 3]) in the model instantiated

decoder.conv_norm_out.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.conv_norm_out.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.conv_out.weight: found shape torch.Size([3, 128, 3, 3]) in the checkpoint and torch.Size([3, 64, 3, 3]) in the model instantiated

decoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_in.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_in.weight: found shape torch.Size([128, 3, 3, 3]) in the checkpoint and torch.Size([64, 3, 3, 3]) in the model instantiated

encoder.conv_norm_out.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_norm_out.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_out.weight: found shape torch.Size([8, 512, 3, 3]) in the checkpoint and torch.Size([8, 64, 3, 3]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv1.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv2.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm1.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm2.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

当我执行命令bash sample/t2v.sh ,出现预训练模型与实际模型形状不匹配的情况，请问这个问题该如何解决呀？谢谢您！

It looks like you used an incorrect pre-trained model when loading the vae model. Please check it.

likeatingcake · Answer 2 · Thu Mar 28 2024 14:40:02 GMT+0800 (China Standard Time)

decoder.conv_in.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.conv_in.weight: found shape torch.Size([512, 4, 3, 3]) in the checkpoint and torch.Size([64, 4, 3, 3]) in the model instantiated

decoder.conv_norm_out.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.conv_norm_out.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.conv_out.weight: found shape torch.Size([3, 128, 3, 3]) in the checkpoint and torch.Size([3, 64, 3, 3]) in the model instantiated

decoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_in.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_in.weight: found shape torch.Size([128, 3, 3, 3]) in the checkpoint and torch.Size([64, 3, 3, 3]) in the model instantiated

encoder.conv_norm_out.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_norm_out.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_out.weight: found shape torch.Size([8, 512, 3, 3]) in the checkpoint and torch.Size([8, 64, 3, 3]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv1.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv2.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm1.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm2.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

当我执行命令bash sample/t2v.sh ,出现预训练模型与实际模型形状不匹配的情况，请问这个问题该如何解决呀？谢谢您！

It looks like you used an incorrect pre-trained model when loading the vae model. Please check it.

(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/t2v.sh
Using model!
Traceback (most recent call last):
File "/home/yueyc/Latte/sample/sample_t2v.py", line 167, in
main(OmegaConf.load(args.config))
File "/home/yueyc/Latte/sample/sample_t2v.py", line 38, in main
vae = AutoencoderKL.from_pretrained(args.pretrained_model_path, subfolder="vae", torch_dtype=torch.float16).to(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/modeling_utils.py", line 812, in from_pretrained
unexpected_keys = load_model_dict_into_meta(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/modeling_utils.py", line 155, in load_model_dict_into_meta
raise ValueError(
ValueError: Cannot load /home/yueyc/Latte/t2v_required_models/ because decoder.conv_in.bias expected shape tensor(..., device='meta', size=(64,)), but got torch.Size([512]). If you want to instead overwrite randomly initialized weights, please make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. For more information, see also: huggingface/diffusers#1619 (comment) as an example.
之前的代码在加载vae预训练模型时，我添加了 low_cpu_mem_usage=False and `ignore_mismatched_sizes=True这两个参数，但会出现之前提到的警告，如果不添加这两个参数，便会出现上面的错误。