Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation (EAT )

Yuan Gan · Zongxin Yang · Xihang Yue · Lingyun Sun · Yi Yang

News:

07/09/2023 Release the pre-trained weight and inference code.

Environment

Recommend trying the demo in Colab for the quickest configuration.

Recommend to use mamba, faster than conda, to install dependencies.

conda/mamba env create -f environment.yml

Checkpoints && Demo dependencies

In the EAT_code folder, Use gdown or download and unzip the ckpt, demo and Utils to the specific folder.

gdown --id 1KK15n2fOdfLECWN5wvX54mVyDt18IZCo && unzip -q ckpt.zip -d ckpt
gdown --id 1MeFGC7ig-vgpDLdhh2vpTIiElrhzZmgT && unzip -q demo.zip -d demo
gdown --id 1HGVzckXh-vYGZEUUKMntY1muIbkbnRcd && unzip -q Utils.zip -d Utils

Run demo

Run the code under our eat environment with conda activate eat.

CUDA_VISIBLE_DEVICES=0 python demo.py --root_wav ./demo/video_processed/W015_neu_1_002 --emo hap

root_wav: ['obama', 'M003_neu_1_001', 'W015_neu_1_002', 'W009_sad_3_003', 'M030_ang_3_004'] (preprocessed wavs are at ./demo/video_processed/. The obama wav is about 5 mins, while others are much shorter.)

emo: ['ang', 'con', 'dis', 'fea', 'hap', 'neu', 'sad', 'sur']

If you want to process your video, please let us know. We will publish the pre-process code as soon as possible.

TODO:

Add MEAD test
Preprocess Code
Evaluation Code
Training Dataset
Baselines

Acknowledge

We acknowledge these works for their public code and selfless help: EAMM, OSFV (unofficial), AVCT, PC-AVS and so on.

zhongshijun / EAT_code

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation (EAT )

Environment

Checkpoints && Demo dependencies

Run demo

Acknowledge

About

Languages