SynergyNet

3DV 2021: Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry

Cho-Ying Wu, Qiangeng Xu, Ulrich Neumann, CGIT Lab at University of Souther California

This paper supersedes the previous version of M3-LRN.

News: Our new work [Cross-Modal Perceptionist] is accepted to CVPR22, which is based on this SynergyNet project.

Advantages

👍 SOTA on all 3D facial alignment, face orientation estimation, and 3D face modeling.

👍 Fast inference with 3000fps on a laptop RTX 2080.

👍 Simple implementation with only widely used operations.

(This project is built/tested on Python 3.8 and PyTorch 1.9 on a compatible GPU)

Single Image Inference Demo

Clone

git clone https://github.com/choyingw/SynergyNet

cd SynergyNet
Use conda

conda create --name SynergyNet

conda activate SynergyNet
Install pre-requisite common packages

PyTorch 1.9 (should also be compatiable with 1.0+ versions), Torchvision, Opencv, Scipy, Matplotlib, Cython
Download data [here] and [here]. Extract these data under the repo root.

These data are processed from [3DDFA] and [FSA-Net].

Download pretrained weights [here]. Put the model under 'pretrained/'

Compile Sim3DR and FaceBoxes:

cd Sim3DR

./build_sim3dr.sh

cd ../FaceBoxes

./build_cpu_nms.sh

cd ..
Inference

python singleImage.py -f img

The default inference requires a compatible GPU to run. If you would like to run on a CPU, please comment the .cuda() and load the pretrained weights into cpu.

Benchmark Evaluation

Follow Single Image Inference Demo: Step 1-4
Benchmarking

python benchmark.py -w pretrained/best.pth.tar

Print-out results and visualization fo first-50 examples are stored under 'results/' (see 'demo/' for some pre-generated samples as references) are shown.

Updates: Best head pose estimation [pretrained model] (Mean MAE: 3.31) that is better than number reported in paper (3.35). Use -w to load different pretrained models.

Training

Follow Single Image Inference Demo: Step 1-4.
Download training data from [3DDFA]: train_aug_120x120.zip and extract the zip file under the root folder (Containing about 680K images).
bash train_script.sh
Please refer to train_script for hyperparameters, such as learning rate, epochs, or GPU device. The default settings take ~19G on a 3090 GPU and about 6 hours for training. If your GPU is less than this size, please decrease the batch size and learning rate proportionally.

Textured Artistic Face Meshes

Follow Single Image Inference Demo: Step 1-5.
Download artistic faces data [here], which are from [AF-Dataset]. Download our predicted UV maps [here] by UV-texture GAN. Extract them under the root folder.
python artistic.py -f art-all --png(whole folder)

python artistic.py -f art-all/122.png(single image)

Note that this artistic face dataset contains many different level/style face abstration. If a testing image is close to real, the result is much better than those of highly abstract samples.

Textured Real Face Renderings

Follow Single Image Inference Demo: Step 1-5.
Download our predicted UV maps and real face images for AFLW2000-3D [here] by UV-texture GAN. Extract them under the root folder.
python uv_texture_realFaces.py -f texture_data/real --png (whole folder)

python uv_texture_realFaces.py -f texture_data/real/image00002_real_A.png (single image)

The results (3D meshes and renderings) are stored under 'inference_output'

More Results

We show a comparison with [DECA] using the top-3 largest roll angle samples in AFLW2000-3D.

Facial alignemnt on AFLW2000-3D (NME of facial landmarks):

Face orientation estimation on AFLW2000-3D (MAE of Euler angles):

Results on artistic faces:

Related Project

[Cross-Modal Perceptionist] (analysis on relation for voice and 3D face)

Bibtex

If you find our work useful, please consider to cite our work

@INPROCEEDINGS{wu2021synergy,
  author={Wu, Cho-Ying and Xu, Qiangeng and Neumann, Ulrich},
  booktitle={2021 International Conference on 3D Vision (3DV)}, 
  title={Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry}, 
  year={2021}
  }

Acknowledgement

The project is developed on [3DDFA] and [FSA-Net]. Thank them for their wonderful work. Thank [3DDFA-V2] for the face detector and rendering codes.

zhanghm1995 / SynergyNet