πŸ“– A curated list of resources dedicated to talking face.

Awesome Talking Face Awesome

This is a repository for organizing papres, codes and other resources related to talking face/head. Most papers are linked to the pdf address provided by "arXiv" or "OpenAccess". However, some papers require an academic license to browse. For example, IEEE, springer, and elsevier journal, etc.

2022.09 Update!

Thanks for PR from everybody! From now on, I'll occasionally include some papers about video-driven talking face generation. Because I found that the community is trying to include the video-driven methods into the talking face generation scope, though it is originally termed as Face Reenactment.

So, if you are looking for video-driven talking face generation, I would suggest you have a star here, and go to search Face Reenactment, you'll find more :)

One more thing, please correct me if you find that there are any paper noted as arXiv paper has been accepted to some conferences or journals.

2021.11 Update!

I updated a batch of papers that appeared in the past few months. In this repo, I was intend to cover the audio-driven talking face generation works. However, I found several text-based research works are also very interesting. So I included them here. Enjoy it!


  • Datasets and survey


2D Video - Person independent


  • Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks [arXiv 2023] Paper
  • Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis [arXiv 2023] [Paper](Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis)
  • Parametric Implicit Face Representation for Audio-Driven Facial Reenactment [CVPR 2023] Paper
  • Identity-Preserving Talking Face Generation with Landmark and Appearance Priors [CVPR 2023] Paper Code
  • StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator [CVPR 2023] Paper ProjectPage Code
  • Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos [arXiv 2023] Paper ProjectPage
  • Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model [arXiv 2023] Paper
  • High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning [arXiv 2023] Paper
  • StyleLipSync: Style-based Personalized Lip-sync Video Generation [arXiv 2023] Paper ProjectPage Code
  • GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation [arXiv 2023] Paper ProjectPage
  • High-Fidelity and Freely Controllable Talking Head Video Generation [CVPR 2023] Paper Project Page
  • One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field [CVPR 2023] Paper ProjectPage
  • Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert [CVPR 2023] Paper Code
  • Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations [arXiv 2023] Paper
  • That's What I Said: Fully-Controllable Talking Face Generation [arXiv 2023] Paper ProjectPage
  • Emotionally Enhanced Talking Face Generation [arXiv 2023] Paper Code ProjectPage
  • A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation [MLSys Workshop 2023] Paper
  • TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles [arXiv 2023] Paper
  • FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions [ICME 2023] Paper
  • DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder [arXiv 2023] Paper ProjectPage
  • DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions [ICASSP 2023] Paper Code ProjectPage
  • GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis [ICLR 2023] Paper Code ProjectPage
  • OTAvatar : One-shot Talking Face Avatar with Controllable Tri-plane Rendering [arXiv 2023] Paper Code
  • Emotionally Enhanced Talking Face Generation [arXiv 2023] Paper Code ProjectPage
  • Style Transfer for 2D Talking Head Animation [arXiv 2023] Paper
  • READ Avatars: Realistic Emotion-controllable Audio Driven Avatars [arXiv 2023] Paper
  • On the Audio-visual Synchronization for Lip-to-Speech Synthesis [arXiv 2023] Paper
  • DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis [arXiv 2023] Paper
  • Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation [arXiv 2023] Paper ProjectPage
  • StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles [AAAI 2023] Paper Code
  • Audio-Visual Face Reenactment [WACV 2023] Paper ProjectPage Code


  • Memories are One-to-Many Mapping Alleviators in Talking Face Generation [arXiv 2022] Paper ProjectPage
  • Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers [SIGGRAPH Asia 2022] Paper
  • Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors [arXiv 2022] Paper ProjectPage
  • Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis [arXiv 2022] Paper ProjectPage
  • SPACE: Speech-driven Portrait Animation with Controllable Expression [arXiv 2022] Paper ProjectPage
  • Compressing Video Calls using Synthetic Talking Heads [BMVC 2022] Paper Project Page
  • Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement [arXiv 2022] Paper
  • StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation [arXiv 2022] Paper
  • Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [arXiv 2022] Paper
  • EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model [SIGGRAPH 2022] Paper
  • Talking Head from Speech Audio using a Pre-trained Image Generator [ACM MM 2022] Paper
  • Latent Image Animator: Learning to Animate Images via Latent Space Navigation [ICLR 2022] Paper ProjectPage(note this page has auto-play music...) Code
  • Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis [ECCV 2022] Paper ProjectPage Code
  • Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation [ECCV 2022] Paper ProjectPage Code
  • Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [ICASSP 2022] Paper ProjectPage Code
  • StableFace: Analyzing and Improving Motion Stability for Talking Face Generation [arXiv 2022] Paper ProjectPage
  • Emotion-Controllable Generalized Talking Face Generation [IJCAI 2022] Paper
  • StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN [arXiv 2022] Paper Code ProjectPage
  • DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering [arXiv 2022] Paper
  • Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions [arXiv 2022] Paper
  • Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels [TMM 2022] Paper
  • Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] Paper ProjectPage Code
  • Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning [CVPR 2022] Paper Code ProjectPage
  • Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] Paper Code ProjectPage
  • Expressive Talking Head Generation with Granular Audio-Visual Control [CVPR 2022] Paper
  • Talking Face Generation with Multilingual TTS [CVPR 2022 Demo] Paper DemoPage
  • SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory [AAAI 2022] Paper


  • Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation [SIGGRAPH Asia 2021] Paper Code
  • Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis [ACMMM 2021] Paper Code
  • AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] Paper Code
  • FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning [ICCV 2021] Paper Code
  • Learned Spatial Representations for Few-shot Talking-Head Synthesis [ICCV 2021] Paper
  • Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [CVPR 2021] Paper Code ProjectPage
  • One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing [CVPR 2021] Paper
  • Audio-Driven Emotional Video Portraits [CVPR 2021] Paper Code
  • AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person [arXiv 2021] Paper
  • Talking Head Generation with Audio and Speech Related Facial Action Units [BMVC 2021] Paper
  • Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion [IJCAI 2021] Paper
  • Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation [AAAI 2021] Paper
  • Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [arXiv 2021] Paper Code


  • Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose [arXiv 2020] Paper Code
  • A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild [ACMMM 2020] Paper Code
  • Talking Face Generation with Expression-Tailored Generative Adversarial Network [ACMMM 2020] Paper
  • Speech Driven Talking Face Generation from a Single Image and an Emotion Condition [arXiv 2020] Paper Code
  • A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors [ICPR 2020] Paper
  • Everybody's Talkin': Let Me Talk as You Want [arXiv 2020] Paper
  • HeadGAN: Video-and-Audio-Driven Talking Head Synthesis [arXiv 2020] Paper
  • Talking-head Generation with Rhythmic Head Motion [ECCV 2020] Paper
  • Neural Voice Puppetry: Audio-driven Facial Reenactment [ECCV 2020] Paper Project Code
  • Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis [CVPR 2020] Paper
  • Robust One Shot Audio to Video Generation [CVPRW 2020] Paper
  • MakeItTalk: Speaker-Aware Talking Head Animation [SIGGRAPH Asia 2020] Paper Code
  • FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis. [AAAI 2020] Paper
  • Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose [AAAI 2020] Paper
  • Photorealistic Lip Sync with Adversarial Temporal Convolutional [arXiv 2020] Paper
  • Animating Face using Disentangled Audio Representations [WACV 2020] Paper

Before 2020

  • Realistic Speech-Driven Facial Animation with GANs. [IJCV 2019] Paper PorjectPage
  • Few-Shot Adversarial Learning of Realistic Neural Talking Head Models [ICCV 2019] Paper Code
  • Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss [CVPR 2019] Paper Code
  • Talking Face Generation by Adversarially Disentangled Audio-Visual Representation [AAAI 2019] Paper Code ProjectPage
  • Lip Movements Generation at a Glance [ECCV 2018] Paper
  • X2Face: A network for controlling face generation using images, audio, and pose codes [ECCV 2018] Paper Code ProjectPage
  • Talking Face Generation by Conditional Recurrent Adversarial Network [IJCAI 2019] Paper Code
  • Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks [arXiv 2018] Paper
  • High-Resolution Talking Face Generation via Mutual Information Approximation [arXiv 2018] Paper
  • Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network [arXiv 2018] Paper
  • You said that? [BMVC 2017] Paper

2D Video - Person dependent

  • Synthesizing Obama: Learning Lip Sync from Audio [SIGGRAPH 2017] Paper Project Page
  • HMM-Based Photo-Realistic Talking Face Synthesis Using Facial Expression Parameter Mapping with Deep Neural Networks [Journal of Computer and Communications2017] Paper
  • ObamaNet: Photo-realistic lip-sync from text [arXiv 2017] Paper
  • A deep bidirectional LSTM approach for video-realistic talking head [Multimedia Tools Appl 2015] Paper
  • Photo-Realistic Expressive Text to Talking Head Synthesis [Interspeech 2013] Paper
  • Expressive Speech-Driven Facial Animation [TOG 2005] Paper

3D Animation

  • EmoTalk: Speech-driven emotional disentanglement for 3D face animation [arXiv 2023] Paper ProjectPage
  • FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning [arXiv 2023] Paper Code ProjectPage
  • Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertices Attention [arXiv 2023] Paper
  • Learning Audio-Driven Viseme Dynamics for 3D Face Animation [arXiv 2023] Paper ProjectPage
  • CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior [arXiv 2023] Paper ProjectPage
  • Expressive Speech-driven Facial Animation with controllable emotions [arXiv 2023] Paper
  • Imitator: Personalized Speech-driven 3D Facial Animation [arXiv 2022] Paper ProjectPage
  • PV3D: A 3D Generative Model for Portrait Video Generation [arXiv 2022] Paper ProjectPage
  • Neural Emotion Director: Speech-preserving semantic control of facial expressions in β€œin-the-wild” videos [CVPR 2022] Paper Code
  • FaceFormer: Speech-Driven 3D Facial Animation with Transformers [CVPR 2022] Paper Code ProjectPage
  • LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization [CVPR 2021] Paper
  • MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [ICCV 2021] Paper
  • AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] Paper Code
  • 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head [arXiv 2021] Paper
  • Modality Dropout for Improved Performance-driven Talking Faces [ICMI 2020] Paper
  • Audio- and Gaze-driven Facial Animation of Codec Avatars [arXiv 2020] Paper
  • Capture, Learning, and Synthesis of 3D Speaking Styles [CVPR 2019] Paper
  • VisemeNet: Audio-Driven Animator-Centric Speech Animation [TOG 2018] Paper
  • Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks [TAC 2018] Paper
  • End-to-end Learning for 3D Facial Animation from Speech [ICMI 2018] Paper
  • Visual Speech Emotion Conversion using Deep Learning for 3D Talking Head [MMAC 2018]
  • A Deep Learning Approach for Generalized Speech Animation [SIGGRAPH 2017] Paper
  • Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion [TOG 2017] Paper
  • Speech-driven 3D Facial Animation with Implicit Emotional Awareness A Deep Learning Approach [CVPR 2017]
  • Expressive Speech Driven Talking Avatar Synthesis with DBLSTM using Limited Amount of Emotional Bimodal Data [Interspeech 2016] Paper
  • Real-Time Speech-Driven Face Animation With Expressions Using Neural Networks [TONN 2012] Paper
  • Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar [SIST 2010] Paper



  • Deep Learning for Visual Speech Analysis: A Survey [arXiv 2022] Paper
  • What comprises a good talking-head video generation?: A Survey and Benchmark [arXiv 2020] Paper


