botbw / in-n-out

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing
Official PyTorch implementation of the CVPR 2024 paper

Teaser image

In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing
Yiran Xu, Zhixin Shu, Cameron Smith, Seoung Wug Oh, and Jia-Bin Huang
https://in-n-out-3d.github.io/

Abstract: 3D-aware GANs offer new capabilities for view synthesis while preserving the editing functionalities of their 2D counterparts. GAN inversion is a crucial step that seeks the latent code to reconstruct input images or videos, subsequently enabling diverse editing tasks through manipulation of this latent code. However, a model pre-trained on a particular dataset (e.g., FFHQ) often has difficulty reconstructing images with out-of-distribution (OOD) objects such as faces with heavy make-up or occluding objects. We address this issue by explicitly modeling OOD objects from the input in 3D-aware GANs. Our core idea is to represent the image using two individual neural radiance fields: one for the in-distribution content and the other for the out-of-distribution object. The final reconstruction is achieved by optimizing the composition of these two radiance fields with carefully designed regularization. We demonstrate that our explicit decomposition alleviates the inherent trade-off between reconstruction fidelity and editability. We evaluate reconstruction accuracy and editability of our method on challenging real face images and videos and showcase favorable results against other baselines.

Requirements

  • We recommend Linux for performance and compatibility reasons.
  • The code is built upon NVIDIA's eg3d repo.
  • 64-bit Python 3.8 and PyTorch 1.11.0 (or later). See https://pytorch.org for PyTorch install instructions. We tested our code on Python 3.9 and PyTorch 1.12.1.
  • Python libraries: see requirements.txt for library dependencies.
  • Set up environment with conda:
conda create -n in-n-out python=3.9
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt

Getting started

Please download a pre-trained EG3D checkpoint, put it at ./eg3d/pretrained_models.

mkdir -p eg3d/pretrained_models
wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/research/eg3d/1/files?redirect=true&path=ffhqrebalanced512-128.pkl' -O ./eg3d/pretrained_models/ffhqrebalanced512-128.pkl

To test our code, we provide a pre-trained checkpoint here. Please download the checkpoint and place it at eg3d/ckpts.

Please download the data and unzip it at eg3d/data/wildvideos.

We also provide all StyleCLIP checkpoints here. Please download them and unzip them at eg3d/CLIPStyle/mapper_results. (e.g., unzip mapper_results.zip -d ./eg3d/CLIPStyle)

To edit a video, as an example, run

cd eg3d
bash scripts/run_test_styleclip.sh rednose2 eyeglasses ckpts/rednose2

Don't worry about the Missing key(s) error as eyeglasses mapper has no fine mapper. The results will be saved at eg3d/results/rednose2.

Preparing data

  1. Processed data. We provide a dataset of preprocessed data. Please download it and put it at eg3d/data/wildvideos
  2. Your own data. This includes human face alignment and will use part of the code from official EG3D repo. First, follow EG3D's instructions on setting up Deep3DFaceRecon_pytorch.
cd data_preprocessing/ffhq/
git clone https://github.com/sicxu/Deep3DFaceRecon_pytorch.git

Install Deep3DFaceRecon_pytorchfollowing the their instructions.

Also make sure you have their checkpoint file epoch_20.pth and place it at data_preprocessing/ffhq/Deep3DFaceRecon_pytorch/checkpoints/pretrained/epoch_20.pth.

We provide a scrip batch_preprocess_in_the_wild.sh to preprocess your own data of human faces. The script accepts following folder tree (either a video or an image):

InputRoot
├── VideoName1
│   ├── frame1
│   ├── frame2
...
│   ├── frameN
└── ImageName1
    └── image1
...

Run

bash batch_preprocess_in_the_wild.sh ${InputRoot} ${OutputRoot} ${VideoName}
bash batch_preprocess_in_the_wild.sh ${InputRoot} ${OutputRoot} ${ImageName}

Training

To train our model on a video, as an example, run

cd eg3d
bash scripts/run_train.sh rednose2 train

The results will be saved at ckpts/rednose2/train.

To run your data, run

bash scripts/run_train.sh ${videoname} ${expname}

OOD object removal

# change to eg3d
cd eg3d
# Here we try to use the pre-trained checkpoint. Suppose it has been placed at ./ckpts/rednose2/
# Remove the OOD object.
python outdomain/test_outdomain.py --remove_ood=true --smooth_out=true --network=pretrained_models/ffhqrebalanced512-128.pkl --ckpt_path=./ckpts/rednose2/triplanes.pt --target_path Path-to-rednose2 --latents_path ./ckpts/rednose2/triplanes.pt --outdir ./results/rednose2/eval/ood_removal_smoothed
# Please replace `Path-to-rednose2` with your own path.

# Save it as a video.
python frames2vid.py --frames_path ./results/rednose2/eval/ood_removal_smoothed/frames/projected_sr  --output_dir ./results/rednose2/eval/ood_removal_smoothed/frames/projected_sr.mp4

References:

  1. EG3D, Chan et al. 2022
  2. Dynamic NeRF, Gao et al. 2021

Citation

@inproceedings{Xu2024inNout,
  author = {Xu, Yiran and Shu, Zhixin and Smith Cameron and Oh, Seoung Wug and Huang, Jia-Bin},
  title = {In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editings},
  booktitle = {CVPR},
  year = {2024}
}

Development

This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.

License

data_preprocessing/ffhq/3dface2idr_mat.py, data_preprocessing/ffhq/batch_preprocess_in_the_wild.sh, data_preprocessing/ffhq/draw_images_in_the_wild.py, data_preprocessing/ffhq/smooth_video_lms.py, data_preprocessing/ffhq/landmark68_5.py,eg3d/outdomain/*, eg3d/inversion/*, eg3d/frames2vid.py, eg3d/gen_3d_rgb.py, eg3d/vid2frames.py, eg3d/scripts/*, eg3d/frames2vid.py, eg3d/vid2frames.py, w_avg.pt, and other materials including the model checkpoints and shell scripts are licensed under the CC BY-NC.

Files at eg3d/CLIPStyle/* are from StyleCLIP.

Files at eg3d/configs/*, eg3d/criteria are from PTI.

Other files at dataset_preprocessing, eg3d/dnnlib, eg3d/gui_utils, eg3d/torch_utils, eg3d/training, and eg3d/camera_utils.py, eg3d/cammat2json.py, eg3d/gen_3d_rgb.py, eg3d/gen_samples.py, eg3d/gen_videos.py, eg3d/legacy.py, are licensed from NVIDIA license.

Some images were from Upsplash under the standard Unsplash license.

About

License:Other


Languages

Language:Python 81.8%Language:Cuda 13.7%Language:C++ 4.3%Language:Shell 0.2%