sstzal / DiffTalk

[CVPR2023] The implementation for "DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inference question

Bebaam opened this issue · comments

When running inference, I only get an incomplete image with landmarks and mask. What do I need to do in order to get a clean image?
0000_0000

I also encountered this problem. This is because the model parameters given by the author only include encoder-decoder. The complete model is too large. I saved 8.2G after training.

okay that is unfortunate, thank you for the insight.

commented

Hello, may I ask how the signal features of your audio are extracted