One-shot Implicit Animatable Avatars with Model-based Priors [ICCV2023]

teaser.mp4

ELICIT creates free-viewpoint motion videos from a single image by constructing an animatable avatar NeRF representation in one-shot learning.

Official repository of "One-shot Implicit Animatable Avatars with Model-based Priors".

[Arxiv] [Website]

What Can Your Learn from ELICIT?

The data-efficient pipeline of creating a 3D animatable avatar from a single image.
Use CLIP-based semantic loss to infer the entire 3D appearance of the human body with the help of a rough SMPL shape.
A segmentation-based sampling strategy to create more realistic visual details and geometries for 3D avatars.

Installation

Please follow the Installation Instruction to setup all the required packages.

Data

Results of the experiments

We provide result videos in our webpage for the qualitative and quantitative evaluations in our paper. We also provided checkpoints for those experiments in Google Drive.

Training data for re-implementation

For the datasets we use for quantitative evaluations (ZJU-MoCAP, Human 3.6M), please prepare the original datasets into the same format as ZJU-MoCAP. Then use our scripts in tools to preprocess the dataset and render SMPL meshes for training.

For customized single-image data, we provides examples from DeepFashion datasets in dataset/fashion.

See more details in Data Instruction.

Getting Started

Training

python train.py --cfg configs/elicit/zju_mocap/377/smpl_init_texture.yaml # Run SMPL Meshes initialization.
python train.py --cfg configs/elicit/zju_mocap/377/finetune.yaml # Run training on the input subject.

We also provide checkpoints for all the subjects in Google Drive, please unzip the file in the following structure:

${ELICIT_ROOT}
    └── experiments
        └── elicit
            ├── zju_mocap
            ├── h36m
            └── fashion

Please refer to scripts for training all the quantative experiments of novel pose synthesis and novel view synthesis on ZJU MoCap and Human 3.6M.

Evaluation / Rendering

We also provide results of all our quantitative results of ELICIT and other baselines in Google Drive. Please use the bounding masks in this file to calculate correct PSNR, SSIM and LPIPS scores, which are generated by Neural Human Performer and Animatable-NeRF.

Evaluate novel pose synthesis.

python run.py --type movement --cfg configs/elicit/zju_mocap/377/finetune.yaml

Evaluate novel view synthesis.

python run.py --type freeview --cfg configs/elicit/zju_mocap/377/finetune.yaml freeview.use_gt_camera True

Freeview rendering on arbitrary frames.

python run.py --type freeview  --cfg configs/elicit/zju_mocap/377/finetune.yaml freeview.frame_idx $FRAME_INDEX_TO_RENDER

The rendered frames and video will be saved at experiments/zju_mocap/377/latest.

Citation

@inproceedings{huang2022elicit,
  title={One-shot Implicit Animatable Avatars with Model-based Priors},
  author={Huang, Yangyi and Yi, Hongwei and Liu, Weiyang and Wang, Haofan and Wu, Boxi and Wang, Wenxiao and Lin, Binbin and Zhang, Debing and Cai, Deng},
  booktitle={IEEE Conference on Computer Vision (ICCV)}, 
  year={2023}
}

Acknowledgments

Our implementation is mainly based on HumanNeRF, and took reference from Animatable NeRF and AvatarCLIP. We thanks the authors for their open source contributions. In addition, we thank the authors of Animatble NeRF for their help in the data preprocessing of Human 3.6M.

huangyangyi / ELICIT