This is a repository for organizing papres, codes and other resources related to talking face/head. Most papers are linked to the pdf address provided by "arXiv" or "OpenAccess". However, some papers require an academic license to browse. For example, IEEE, springer, and elsevier journal, etc.
π This project is still on-going, pull requests are welcomed!!
If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request. Just letting me know the title of papers can also be a big contribution to me. You can do this by open issue or contact me directly via email.
β If you find this repo useful, please star it!!!
2022.09 Update!
Thanks for PR from everybody! From now on, I'll occasionally include some papers about video-driven talking face generation. Because I found that the community is trying to include the video-driven methods into the talking face generation scope, though it is originally termed as Face Reenactment.
So, if you are looking for video-driven talking face generation, I would suggest you have a star here, and go to search Face Reenactment, you'll find more :)
One more thing, please correct me if you find that there are any paper noted as arXiv paper has been accepted to some conferences or journals.
2021.11 Update!
I updated a batch of papers that appeared in the past few months. In this repo, I was intend to cover the audio-driven talking face generation works. However, I found several text-based research works are also very interesting. So I included them here. Enjoy it!
TO DO LIST
- Main paper list
- Add paper link
- Add codes if have
- Add project page if have
- Datasets and survey
Papers
2D Video - Person independent
- DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis [arXiv 2023] Paper
- Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation [arXiv 2023] Paper ProjectPage
- StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles [AAAI 2023] Paper Code
- Audio-Visual Face Reenactment [WACV 2023] Paper Project Page Code
- Memories are One-to-Many Mapping Alleviators in Talking Face Generation [arXiv 2022] Paper ProjectPage
- Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers [SIGGRAPH Asia 2022] Paper
- Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors [arXiv 2022] Paper ProjectPage
- Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis [arXiv 2022] Paper ProjectPage
- SPACE: Speech-driven Portrait Animation with Controllable Expression [arXiv 2022] Paper ProjectPage
- Compressing Video Calls using Synthetic Talking Heads [BMVC 2022] Paper Project Page
- Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement [arXiv 2022] Paper
- StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation [arXiv 2022] Paper
- Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [arXiv 2022] Paper
- EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model [SIGGRAPH 2022] Paper
- Talking Head from Speech Audio using a Pre-trained Image Generator [ACM MM 2022] Paper
- Latent Image Animator: Learning to Animate Images via Latent Space Navigation [ICLR 2022] Paper ProjectPage(note this page has auto-play music...) Code
- Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis [ECCV 2022] Paper ProjectPage Code
- Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation [ECCV 2022] Paper ProjectPage Code
- Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [ICASSP 2022] Paper ProjectPage Code
- StableFace: Analyzing and Improving Motion Stability for Talking Face Generation [arXiv 2022] Paper ProjectPage
- Emotion-Controllable Generalized Talking Face Generation [IJCAI 2022] Paper
- StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN [arXiv 2022] Paper Code ProjectPage
- DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering [arXiv 2022] Paper
- Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions [arXiv 2022] Paper
- Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels [TMM 2022] Paper
- Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] Paper ProjectPage Code
- Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning [CVPR 2022] Paper Code ProjectPage
- Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] Paper Code ProjectPage
- Expressive Talking Head Generation with Granular Audio-Visual Control [CVPR 2022] Paper
- Talking Face Generation with Multilingual TTS [CVPR 2022 Demo] Paper DemoPage
- SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory [AAAI 2022] Paper
- Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation [SIGGRAPH Asia 2021] Paper Code
- Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis [ACMMM 2021] Paper Code
- AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] Paper Code
- FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning [ICCV 2021] Paper Code
- Learned Spatial Representations for Few-shot Talking-Head Synthesis [ICCV 2021] Paper
- Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [CVPR 2021] Paper Code ProjectPage
- One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing [CVPR 2021] Paper
- Audio-Driven Emotional Video Portraits [CVPR 2021] Paper Code
- AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person [arXiv 2021] Paper
- Talking Head Generation with Audio and Speech Related Facial Action Units [BMVC 2021] Paper
- Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion [IJCAI 2021] Paper
- Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation [AAAI 2021] Paper
- Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [arXiv 2021] Paper Code
- Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose [arXiv 2020] Paper Code
- A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild [ACMMM 2020] Paper Code
- Talking Face Generation with Expression-Tailored Generative Adversarial Network [ACMMM 2020] Paper
- Speech Driven Talking Face Generation from a Single Image and an Emotion Condition [arXiv 2020] Paper Code
- A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors [ICPR 2020] Paper
- Everybody's Talkin': Let Me Talk as You Want [arXiv 2020] Paper
- HeadGAN: Video-and-Audio-Driven Talking Head Synthesis [arXiv 2020] Paper
- Talking-head Generation with Rhythmic Head Motion [ECCV 2020] Paper
- Neural Voice Puppetry: Audio-driven Facial Reenactment [ECCV 2020] Paper Project Code
- Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis [CVPR 2020] Paper
- Robust One Shot Audio to Video Generation [CVPRW 2020] Paper
- MakeItTalk: Speaker-Aware Talking Head Animation [SIGGRAPH Asia 2020] Paper
- FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis. [AAAI 2020] Paper
- Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose [AAAI 2020] Paper
- Photorealistic Lip Sync with Adversarial Temporal Convolutional [arXiv 2020] Paper
- SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES [arXiv 2020] Paper
- Animating Face using Disentangled Audio Representations [WACV 2020] Paper
- Realistic Speech-Driven Facial Animation with GANs. [IJCV 2019] Paper PorjectPage
- Few-Shot Adversarial Learning of Realistic Neural Talking Head Models [ICCV 2019] Paper Code
- Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss [CVPR 2019] Paper Code
- Talking Face Generation by Adversarially Disentangled Audio-Visual Representation [AAAI 2019] Paper Code ProjectPage
- Lip Movements Generation at a Glance [ECCV 2018] Paper
- X2Face: A network for controlling face generation using images, audio, and pose codes [ECCV 2018] Paper Code ProjectPage
- Talking Face Generation by Conditional Recurrent Adversarial Network [IJCAI 2019] Paper Code
- Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks [arXiv 2018] Paper
- High-Resolution Talking Face Generation via Mutual Information Approximation [arXiv 2018] Paper
- Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network [arXiv 2018] Paper
- You said that? [BMVC 2017] Paper
2D Video - Person dependent
- Synthesizing Obama: Learning Lip Sync from Audio [SIGGRAPH 2017] Paper Project Page
- PHOTOREALISTIC ADAPTATION AND INTERPOLATION OF FACIAL EXPRESSIONS USING HMMS AND AAMS FOR AUDIO-VISUAL SPEECH SYNTHESIS [ICIP 2017] Paper
- HMM-Based Photo-Realistic Talking Face Synthesis Using Facial Expression Parameter Mapping with Deep Neural Networks [Journal of Computer and Communications2017] Paper
- ObamaNet: Photo-realistic lip-sync from text [arXiv 2017] Paper
- A deep bidirectional LSTM approach for video-realistic talking head [Multimedia Tools Appl 2015] Paper
- Photo-Realistic Expressive Text to Talking Head Synthesis [Interspeech 2013] Paper
- PHOTO-REAL TALKING HEAD WITH DEEP BIDIRECTIONAL LSTM [ICASSP 2015] Paper
- Expressive Speech-Driven Facial Animation [TOG 2005] Paper
3D Animation
- Learning Audio-Driven Viseme Dynamics for 3D Face Animation [arXiv 2023] Paper ProjectPage
- CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior [arXiv 2023] Paper ProjectPage
- Expressive Speech-driven Facial Animation with controllable emotions [arXiv 2023] Paper
- Imitator: Personalized Speech-driven 3D Facial Animation [arXiv 2022] Paper ProjectPage
- PV3D: A 3D Generative Model for Portrait Video Generation [arXiv 2022] Paper ProjectPage
- Neural Emotion Director: Speech-preserving semantic control of facial expressions in βin-the-wildβ videos [CVPR 2022] Paper Code
- FaceFormer: Speech-Driven 3D Facial Animation with Transformers [CVPR 2022] Paper Code ProjectPage
- LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization [CVPR 2021] Paper
- MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [ICCV 2021] Paper
- AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] Paper Code
- 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head [arXiv 2021] Paper
- Modality Dropout for Improved Performance-driven Talking Faces [ICMI 2020] Paper
- Audio- and Gaze-driven Facial Animation of Codec Avatars [arXiv 2020] Paper
- Capture, Learning, and Synthesis of 3D Speaking Styles [CVPR 2019] Paper
- VisemeNet: Audio-Driven Animator-Centric Speech Animation [TOG 2018] Paper
- Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks [TAC 2018] Paper
- End-to-end Learning for 3D Facial Animation from Speech [ICMI 2018] Paper
- Visual Speech Emotion Conversion using Deep Learning for 3D Talking Head [MMAC 2018]
- A Deep Learning Approach for Generalized Speech Animation [SIGGRAPH 2017] Paper
- Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion [TOG 2017] Paper
- Speech-driven 3D Facial Animation with Implicit Emotional Awareness A Deep Learning Approach [CVPR 2017]
- Expressive Speech Driven Talking Avatar Synthesis with DBLSTM using Limited Amount of Emotional Bimodal Data [Interspeech 2016] Paper
- Real-Time Speech-Driven Face Animation With Expressions Using Neural Networks [TONN 2012] Paper
- Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar [SIST 2010] Paper
Datasets
- TalkingHead-1KH Link
- MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV 2020] ProjectPage
- VoxCeleb Link
- LRW Link
- LRS2 Link
- GRID Link
- CREMA-D Link