py_Audio2Face

Generate expressive facial animation from audio

Overview

This Python script leverages the headless mode of Audio2Face to generate animations for characters:

lip movement animations
face animations
emotions (new since 2023.2.0)
It provides methods to control the Audio2Face headless server and interact with it through a requests API.

Use cases:

Power your games and movies with expressive facial animations.
Create natural looking virtual avatars.
Generate animations for characters and export them as USD files for example for Maya or Unreal Engine 5.
Stream audio data to the Audio2Face server to generate animations in real-time. Use Live-Lynk to use them in UE5

Demo and Tutorial

Installation

Install the py_audio2face package using pip:

# With pip. Note: this version does not include the streaming feature
pip install py_audio2face
# Install also the streaming feature. This includes additional dependencies like grpcio and protobuf
pip install py_audio2face[streaming]
# Or install the latest version from GitHub to work with the newest version of Audio2Face
pip install git+https://github.com/SocAIty/py_audio2face.git

Modify the ROOT_DIR, DEFAULT_OUTPUT_DIR, and DEFAULT_AUDIO_STREAM_GRPC_PORT variables in the settings.py file as needed.

Prerequisites

Audio2Face installed on the system
Python 3.x

Usage

First initialize Audio2Face instance:

import py_audio2face as pya2f
a2f = pya2f.Audio2Face()

Generate animation for audio files:

# Generate animation for a single audio file
a2f.audio2face_single(
   audio_file_path="path/to/audio/file.wav", #  path of the audio file you want to animate
   output_path="path/to/output/animation.usd",  # path where the animation file will be saved
   fps=60, # frames per second for the animation. Higher fps will result in smoother animations and longer processing time
   emotion_auto_detect=True  # automatically detect emotions in the audio file. If false the set emotion will be used
)

# Generate animation for an entire folder of audio files
a2f.audio2face_folder(input_folder="path/to/my/folder", output_folder='/output', fps=60)

Configure Emotions

The emotion mixin let's you control the strength of the emotions in the generated animation.

# Applies a preferred emotion even in emotion_auto_detect with update_settings=True
a2f.set_emotion(anger=0.9, disgust=0.5, fear=0.1, sadness=0.2, update_settings=True)
# Animate with a fixed set of emotione. 
animated = a2f.audio2face_single(test_audio_1,'emotion_full.usd', fps=24, emotion_auto_detect=False)
# Testing with expressive emotions. 
with_preferred_emotion_detect = a2f.audio2face_single(test_audio_1,'emotion_preset_detect.usd', fps=24, emotion_auto_detect=True)

You can also configure the emotion detection behaviour with the set_settings method.

a2f.a2e_set_settings(a2e_emotion_strength=0.5, a2e_smoothing_exp=0)

Stream audio to Audio2Face:

Instead of providing paths to the a2f headless server, you can stream the audio data directly to the server. This is useful if you want to generate animations in real-time, for a live stream or in a server setting.

For this example we use the media-toolkit to stream audio data. Install it with pip install media-toolkit[AudioFile]".

from media_toolkit import AudioFile
audio = AudioFile().from_file("path/to/audio/file.wav")
audio_stream = audio.to_stream()  # note: this can be any python generator that yields numpy arrays/bytes of audio data
a2f.stream_audio(audio_data, output_path="path/to/output/animation.usd", fps=60)

For streaming under the hood, a different scene with a streaming audio player is loaded in the init method. Then with gRPC requests, the audio data is streamed to the server.

Shutdown Audio2Face Server:

a2f.shutdown_a2f()

Related Projects

Why bother about recording audio files?

Convert text-to-speech with SpeechCraft. Use the natural sounding speech and feed it into audio2face.
Want sound natively like any other character? Use RVC to clone any voice. This sounds so real, you'll not notice the difference to a real one.
Create and animate realistic looking characters with MetaHuman and audio2face

Contribute

Any contribution is appreciated.

streaming: better clean-code for setting streaming mode. Especially with emotions, and default isntance.
streaming: upgrade to protobuf 5
streaming: allow streaming back of blendshapes or on the fly export
create working unit tests
Provide different characters not only mark.usd
Allow multiple generations / streams at the same time

Please raise an issue if you have any suggestions, feature requests or need help with the script.

Acknowledgements

Big thank you for the NVIDIA Team who made Audio2Face a great tool.

SocAIty / py_audio2face