One-Shot Free-View Neural Talking Head Synthesis

Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing".

I‘ve only tried on python 3.6 and pytorch 1.7.

Driving | FOMM | Ours:

Free-View:

Train:

python run.py --config config/vox-256.yaml --device_ids 0,1,2,3,4,5,6,7

Demo:

python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame

free-view (e.g. yaw=20, pitch=roll=0):

python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame --free_view --yaw 20 --pitch 0 --roll 0

Note: run crop-video.py --inp driving_video.mp4 first to get the cropping suggestion and crop the raw video.

Pretrained Model:

Model	Train Set	Baidu Netdisk	Meida Fire
Vox-256-Beta	VoxCeleb-v1	Baidu (PW: c0tc)	MF
Vox-256-Stable	VoxCeleb-v1	soon	soon
Vox-256	VoxCeleb-v2	soon	soon
Vox-512	VoxCeleb-v2	soon	soon

Note:

At present, the Beta Version is not well tuned, the definition of synthesized image is poor, and the mouth shape and eyes are not very accurate.
For free-view synthesis, it is recommended that Yaw, Pitch and Roll are within ±45°, ±20° and ±20° respectively.

Acknowlegement:

Thanks to NV, AliaksandrSiarohin and DeepHeadPose.

yuangan / freeTHS

One-Shot Free-View Neural Talking Head Synthesis

Train:

Demo:

Pretrained Model:

Acknowlegement:

About

Languages