TMElyralab / MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

开源的唇语驱动模型,是从随机初始化开始训练的,还是先对Unet网络结构进行预训练后再训练唇语驱动模型呢?

gobigrassland opened this issue · comments

我看到用到的Unet模型参数与SD1.4模型配置参数,就是其中cross_attention_dim和in_channels的区别。
(1)唇语模型UNet: cross_attention_dim=384, in_channels=8
(2)SD1.4 UNet: cross_attention_dim=768, in_channels=4

是从随机初始化开始训练的