wjl520 / talking-face-generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


The purpose of this repo is to make notes for talking face/head generation.


The methods can divide into real-time and offline. Generating talking face in real-time is our goal, therefore we only discuss nearly real-time methods.

  • 2D-based methods

  • 3D-based methods

    • images --> 3D render and landmark --> GAN model --> face generation
    • audio --> extract audio feature --> high-level feature
  • NeRF-based methods (non real-time)


  • (1) Chinese
  • (2) English
    • mostly obtained by youtube.


  • (1) Chinese audio:

    • ffmpeg extract audio file, mainly xx.wav file, the sample rate usually is 16k. the code is ffmpeg -y -i xx.mp4 -async 1 -ac 1 -vn -acodec pcm_s16le -ar 16000 xx.wav
    • the pretrained model (such as deepspeech) extract the high-level audio feature; or MFCC directly extract audio file, and mfcc features are obtained.
  • (2) video:

    • detect and crop face, ffmpeg tool extracts image set in 25 fps.
    • ffmpeg -y -i xx.mp4 -r 25 ./img/
