taichuai / GETAvatar

[ICCV 2023] GETAvatar: Generative Textured Meshes for Animatable Human Avatars

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ”₯ πŸ”₯ πŸ”₯GETAvatar: Generative Textured Meshes for Animatable Human Avatars (ICCV 2023)πŸ”₯ πŸ”₯ πŸ”₯
Official PyTorch implementation

GETAvatar: Generative Textured Meshes for Animatable Human Avatars
Xuanmeng Zhang*, Jianfeng Zhang*, Rohan Chacko, Hongyi Xu, Guoxian Song, Yi Yang, Jiashi Feng
Paper, Project Page

Abstract: We study the problem of 3D-aware full-body human generation, aiming at creating animatable human avatars with high-quality textures and geometries. Generally, two challenges remain in this field: i) existing methods struggle to generate geometries with rich realistic details such as the wrinkles of garments; ii) they typically utilize volumetric radiance fields and neural renderers in the synthesis process, making high-resolution rendering non-trivial. To overcome these problems, we propose GETAvatar, a Generative model that directly generates Explicit Textured 3D meshes for animatable human Avatar, with photo-realistic appearance and fine geometric details. Specifically, we first design an articulated 3D human representation with explicit surface modeling, and enrich the generated humans with realistic surface details by learning from the 2D normal maps of 3D scan data. Second, with the explicit mesh representation, we can use a rasterization-based renderer to perform surface rendering, allowing us to achieve high-resolution image generation efficiently. Extensive experiments demonstrate that GETAvatar achieves state-of-the-art performance on 3D-aware human generation both in appearance and geometry quality. Notably, GETAvatar can generate images at 512x512 resolution with 17FPS and 1024x1024 resolution with 14FPS, improving upon previous methods by 2x.

πŸ“’ News

  • [2023-10-19]: Code and pretrained model on THuman2.0 released! Check more details here

βš’οΈ Requirements

  • We recommend Linux for performance and compatibility reasons.
  • 1 – 8 high-end NVIDIA GPUs. We have done all testing and development using V100 GPUs.
  • 64-bit Python 3.8 and PyTorch 1.9.0. See https://pytorch.org for PyTorch install instructions.
  • CUDA toolkit 11.1 or later. (Why is a separate CUDA toolkit installation required? We use the custom CUDA extensions from the StyleGAN3 repo. Please see Troubleshooting) .
  • Blender. Download Blender from official link. We used blender-3.2.2-linux, we haven't tested on other versions but newer versions should be OK.
  • We also recommend to install Nvdiffrast following instructions from official repo, and install Kaolin.
  • We provide a script to install packages.

πŸƒβ€β™‚οΈ Getting Started

Clone the gitlab code and necessary files:

git clone https://github.com/magic-research/GETAvatar.git
cd GETAvatar; mkdir cache; cd cache
wget https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/metrics/inception-2015-12-05.pkl

SMPL models

Download the SMPL human models from this (male, female and neutral models) and the mixamo motion sequences from here.

Place them as following:

GETAvatar
|----smplx
    |----mocap
      |----mixamo
          |----0007  
          |----...
          |----0145  
    |----models
      |----smpl
          |----SMPL_FEMALE.pkl
          |----SMPL_MALE.pkl
          |----SMPL_NEUTRAL.pkl
|----...

πŸ“ Preparing datasets

We train GETAvatar on 3D human scan datasets (THuman2.0 and RenderPeople). Here use THuman2.0 as an example because it's free. The same pipeline works also for the commericial dataset RenderPeople.

First, download THuman2.0 dataset and download the fitted SMPL results.

Place them as following:

GETAvatar
|----datasets
    |----THuman2.0
        |----THuman2.0_Release
            |----0000
                |----0000.obj
                |----material0.jpeg
                |----material0.mtl
            |----...
            |----0525
        |----THuman2.0_smpl
            |----0000_smpl.pkl
            |----...
            |----0525_smpl.pkl

First, run the pre-processing script prepare_thuman_scans_smpl.py to align the human scans:

python3 prepare_thuman_scans_smpl.py --tot 1 --id 0

You can run multiple instantces of the script in parallel by simply specifying --tot to be the number of total instances and --id to be the rank of current instance.

Second, render the RGB image with blender:

blender --background test.blend --python render_aligned_thuman.py -- \
--device_id 0 --tot 1 --id 0

You can run multiple instantces of the script in parallel by simply specifying --device_id to be the device ID, --tot to be the number of total instances and --id to be the rank of current instance.

Next, generate the camera pose and SMPL labels:

python3 prepare_thuman_json.py
python3 prepare_ext_smpl_json.py

Finally, render the normal images with pytorch3d:

python3 render_thuman_normal_map.py --tot 1 --id 0

You can run multiple instantces of the script in parallel by simply specifying --tot to be the number of total instances and --id to be the rank of current instance.

The final structure of training dataset is as following:

GETAvatar
|----datasets
  |----THuman2.0_res512
      |----0000
          |----0000.png
          |----0001.png   
          |---- ...              
          |----0099.png  
          |----mesh.obj
          |----blender_transforms.json
      |----0001     
          |----...  
      |----0525   
          |----...
      |----aligned_camera_pose_smpl.json
      |----extrinsics_smpl.json
|----...

πŸ™‰ Inference

Download pretrained model from here and save into ./pretrained_model.

You can generate the multi-view visualization with gen_multi_view_3d.py. For example:

python3 gen_multi_view_3d.py --data=datasets/THuman2.0/THuman2.0_res512  --gpus=1 --batch=4 --batch-gpu=4 --mbstd-group=4 --gamma=10 --dmtet_scale=2 --one_3d_generator=1  --fp32=0  --img_res=512 --norm_interval=1 --dis_pose_cond=True  --normal_dis_pose_cond=True --eik_weight=1e-3  --unit_2norm=True --use_normal_offset=False --blur_rgb_image=False --blur_normal_image=False --camera_type=blender --load_normal_map=True --with_sr=True --seeds=0-3 --grid=2x2 --save_gif=False --render_all_pose=False --resume_pretrain=pretrained_model/THuman_512.pt  --output=output_videos/thu_512.mp4  --outdir=debug

You can specify --img_res to be the image resolution and --resume_pretrained to be the path of checkpoints.

You can generate the animation with gen_animation_view_3d.py. For example:

python3 gen_animation_3d.py --data=datasets/THuman2.0/THuman2.0_res512   --gpus=1 --batch=4 --batch-gpu=4 --mbstd-group=4 --gamma=20 --dmtet_scale=2 --one_3d_generator=1  --fp32=0  --img_res=512 --norm_interval=1 --dis_pose_cond=True  --normal_dis_pose_cond=True --eik_weight=1e-3  --unit_2norm=True --use_normal_offset=False --blur_rgb_image=False  --blur_normal_image=False --camera_type=blender --load_normal_map=True  --with_sr=True --seeds=0-3 --grid=2x2 --save_gif=False --render_all_pose=False --action_type=0145 --frame_skip=1 --resume_pretrain=pretrained_model/THuman_512.pt --output=output_videos/thuman_mocap_0145.mp4 --outdir=debug

You can specify the image resolution with --img_res, the path of checkpoints with --resume_pretrained, the type of the motion sequence with --action_type.

πŸ™€ Train the model

You can train new models using train_3d.py. For example:

python3 train_3d.py  --data=datasets/THuman2.0/THuman2.0_res512  --gpus=8 --batch=32 --batch-gpu=4 --mbstd-group=4 --gamma=10 --dmtet_scale=2 --one_3d_generator=1  --fp32=0 --img_res=512 --norm_interval=1 --dis_pose_cond=True  --normal_dis_pose_cond=True --eik_weight=1e-3  --unit_2norm=True --use_normal_offset=False --blur_rgb_image=False --blur_normal_image=False --camera_type=blender --load_normal_map=True --with_sr=True --outdir=thuman_res512_ckpts

For distributed training, run the script dist_train.sh:

bash dist_train.sh

πŸ™ Credit

GETAvatar builds upon several previous works:

We would like to thank the authors for their contribution to the community!

πŸŽ“ Citation

If you find this codebase useful for your research, please use the following entry.

@inproceedings{zhang2023getavatar,
    title={GETAvatar: Generative Textured Meshes for Animatable Human Avatars},
    author={Zhang, Xuanmeng and Zhang, Jianfeng and Rohan, Chacko and Xu, Hongyi and Song, Guoxian and Yang, Yi and Feng, Jiashi},
    booktitle={ICCV},
    year={2023}
}

About

[ICCV 2023] GETAvatar: Generative Textured Meshes for Animatable Human Avatars

License:MIT License


Languages

Language:Python 87.2%Language:Cuda 9.5%Language:C++ 3.1%Language:Shell 0.2%Language:GLSL 0.1%