Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes

Project Page | Paper | Video

High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes
Haotong Lin, Sida Peng, Zhen Xu, Tao Xie, Xingyi He, Hujun Bao and Xiaowei Zhou
SIGGRAPH Asia 2023 conference track

Installation

Set up the python environment

Tested with an Ubuntu workstation i9-12900K, 3090GPU

conda create -n im4d python=3.10
conda activate im4d
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia # pytorch 2.0.1
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch 
pip install -r requirments.txt

Set up datasets

0. Set up workspace

The workspace is the disk directory that stores datasets, training logs, checkpoints and results. Please ensure it has enough disk space.

export workspace=$PATH_TO_YOUR_WORKSPACE

1. Prepare ZJU-MoCap and NHR datasets.

Please refer to mlp_maps to download ZJU-MoCap and NHR datasets. After downloading, place them into $workspace/zju-mocap and $workspace/NHR, respectively.

2. Prepare the DNA-Rendering dataset.

Since the license of the DNA-Rendering dataset does not allow distribution, we cannot release the processed dataset publicly. You can download the DNA-Rendering dataset at here or OpenXLab . If someone is interested at the processed data, please email me (haotongl@outlook.com). You need to cite DNA-Rendering if you find this data useful.

Pre-trained models

Download pre-trained models from this link for quick test. Place FILENAME.pth into
$workspace/trained_model/SCENE/im4d/FILENAME/latest.pth.
e.g., my_313.pth -> $workspace/trained_model/my_313/im4d/my_313/latest.pth
my_313_demo.pth -> $workspace/trained_model/my_313/im4d/my_313_demo/latest.pth.

Testing

1. Reproduce the quantitative results in the paper.

python run.py --type evaluate --cfg_file configs/exps/im4d/xx_dataset/xx_scene.yaml save_result True

For the NHR dataset, please firstly download the preprocessed data and place them into $workspace/evaluation. This evaluation setting is taken from mlp_maps. Then run one more command to report the PSNR metric:

python scripts/evaluate/im4d/eval_nhr.py --gt_path $workspace/evaluation/sport_1_easymocap --output_path $workspace/result/sport_1_easymocap/im4d/sport1_release/default/step00999999/rgb_0

2. Accelerate the rendering speed .

First, precompute the binary fields.

python run.py --type cache_grid --cfg_file configs/exps/im4d/renbody/0013_01.yaml --configs configs/components/opts/cache_grid.yaml grid_tag default

You may need to change the frames and grid_resolution to fit your scene. For example, the scene in ZJU-MoCap has 300 frames and its height is z-axis:

python run.py --type cache_grid --cfg_file configs/exps/im4d/zju/my_313.yaml --configs configs/components/opts/cache_grid.yaml grid_tag default grid_resolution 128,128,256 test_dataset.frame_sample 0,300,1

Then, render images with the precomputed binary fields.

python run.py --type evaluate --cfg_file configs/exps/im4d/renbody/0013_01.yaml --configs configs/components/opts/fast_render.yaml grid_tag default save_result True

You may try slightly decreasing sigma_thresh (default: 5.0) to preserve more voxels.

3. Render a video with the selected trajectory.

python run.py --type evaluate --cfg_file configs/exps/im4d/renbody/0013_01.yaml --configs configs/components/opts/render_path/renbody_path.yaml

We can render it with the precomputed binary fields by adding one more argument:

python run.py --type evaluate --cfg_file configs/exps/im4d/renbody/0013_01.yaml --configs configs/components/opts/render_path/renbody_path.yaml --configs configs/components/opts/fast_render.yaml

For better performance, you can use our pre-trained demo models which are trained with all camera views.

python run.py --type evaluate --cfg_file configs/exps/im4d/zju/my_313.yaml   --configs configs/components/opts/fast_render.yaml --configs configs/components/opts/render_path/zju_path.yaml exp_name_tag demo

Training

python train_net.py --cfg_file configs/exps/im4d/xx_dataset/xx_scene.yaml

Training with multiple GPUs:

export CUDA_VISIBLE_DEVICES=0,1,2,3
export NUM_GPUS=4
export LOG_LEVEL=WARNING # INFO, DEBUG, WARNING
torchrun --nproc_per_node=$NUM_GPUS train_net.py --cfg_file configs/exps/im4d/xx_dataset/xx_scene.yaml --log_level $LOG_LEVEL distributed True

Running on the custom dataset

For both studio dataset and wild dataset, you can refer to this tutorial.

Acknowledgements

We would like to acknowledge the following inspring prior work:

IBRNet: Learning Multi-View Image-Based Rendering (Wang et al.)
ENeRF: Efficient Neural Radiance Fields for Interactive Free-viewpoint Video (Lin et al.)
K-Planes: Explicit Radiance Fields in Space, Time, and Appearance (Fridovich-Keil et al.)

Big thanks to NeRFAcc (Li et al.) for their efficient implementation, which has significantly accelerated our rendering.

Recently, in the course of refining our codebase, we have incorporated basic implementations of ENeRF and K-Planes. These additions, although not yet thoroughly tested and aligned with the official codes, could serve as useful resources for further exploration and development.

Citation

If you find this code useful for your research, please use the following BibTeX entry

@inproceedings{lin2023im4d,
  title={High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes},
  author={Lin, Haotong and Peng, Sida and Xu, Zhen and Xie, Tao and He, Xingyi and Bao, Hujun and Zhou, Xiaowei},
  booktitle={SIGGRAPH Asia Conference Proceedings},
  year={2023}
}

zju3dv / im4d