SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis

Guangcong Wang Zhaoxi Chen Chen Change Loy Ziwei Liu

S-Lab, Nanyang Technological University

ICCV 2023

🐤 Project | YouTube | arXiv

🐤 Update:

[20231011] Added a poster.
[20231010] Fixed datasets_depth_kinect.py and datasets_depth_zed.py
[20230906] 📢 Tutorial: Detailed explanation of SparseNeRF, slide, figure+pseudo algorithm table A tutorial on how to implement SparseNeRF is released.
[20230822] Add a tutorial on how to integrate SparseNeRF into your own dataset.
[20230820] Add FreeNeRF w/ SparseNeRF, which achieves better results. It shows that our SparseNeRF might be integrated into other methods.
[20230814] Code released. Please let us know if any bug exists. We summarize the frequently asked issues in FAQ.
[20230806] We are working very hard on releasing the code. We expect to release the code in a few days.
[20230328] Released Project | YouTube | arXiv.
[20221004] The old version of SparseNeRF was released. The performance would be slightly worse than the current version.

🐤 Features:

✅ Applied for General scenes. Depth maps from pre-trained monocular depth estimation or depth sensors, which are coarse and easy to obtain.
✅ Only 1 GPU for training and test. Training a scene requires about 2 hours.
✅ Combine SparseNeRF with other methods: FreeNeRF w/ SparseNeRF, which achieves better results. It shows that our SparseNeRF might be integrated into other methods.
✅ FAQ: A frequently asked questions (FAQ) list.
✅ Use your dataset: A tutorial on how to use your own dataset.
✅ Tutorial: Detailed explanation of SparseNeRF, slide, figure+pseudo algorithm table: A tutorial on how to implement SparseNeRF is released. If you cannot open the link, you can download it in the tutorial folder.
✅ A poster for the overview. Also see Project | YouTube | arXiv.

🐤 TL;DR: We present SparseNeRF, a simple yet effective method that synthesizes novel views given a few images. SparseNeRF distills robust local depth ranking priors from real-world inaccurate depth observations, such as pre-trained monocular depth estimation models or consumer-level depth sensors.

🐤 Abstract: Neural Radiance Field (NeRF) significantly degrades when only a limited number of views are available. To complement the lack of 3D information, depth-based models, such as DSNeRF and MonoSDF, explicitly assume the availability of accurate depth maps of multiple views. They linearly scale the accurate depth maps as supervision to guide the predicted depth of few-shot NeRFs. However, accurate depth maps are difficult and expensive to capture due to wide-range depth distances in the wild.

In this work, we present a new Sparse-view NeRF (SparseNeRF) framework that exploits depth priors from real-world inaccurate observations. The inaccurate depth observations are either from pre-trained depth models or coarse depth maps of consumer-level depth sensors. Since coarse depth maps are not strictly scaled to the ground-truth depth maps, we propose a simple yet effective constraint, a local depth ranking method, on NeRFs such that the expected depth ranking of the NeRF is consistent with that of the coarse depth maps in local patches. To preserve the spatial continuity of the estimated depth of NeRF, we further propose a spatial continuity constraint to encourage the consistency of the expected depth continuity of NeRF with coarse depth maps. Surprisingly, with simple depth ranking constraints, SparseNeRF outperforms all state-of-the-art few-shot NeRF methods (including depth-based models) on standard LLFF and DTU datasets. Moreover, we collect a new dataset NVS-RGBD that contains real-world depth maps from Azure Kinect, ZED 2, and iPhone 13 Pro. Extensive experiments on NVS-RGBD dataset also validate the superiority and generalizability of SparseNeRF.

🐤 Framework Overview: SparseNeRF consists of two streams, i.e., NeRF and depth prior distillation. As for NeRF, we use Mip-NeRF as the backbone. we use a NeRF reconstruction loss. As for depth prior distillation, we distill depth priors from a pre-trained depth model. Specifically, we propose a local depth ranking regularization and a spatial continuity regularization to distill robust depth priors from coarse depth maps.

1. Prerequisites

Linux or macOS
Python 3.6.13
NVIDIA GPU + CUDA cuDNN(10.1)
OpenCV

2. Installation

We recommend using the virtual environment (conda) to run the code easily.

conda create -n sparsenerf python=3.6.13
conda activate sparsenerf
pip install -r requirements.txt

Download jax+cuda (jaxlib-0.1.68+cuda101-cp36) wheels from this link by

wget https://storage.googleapis.com/jax-releases/cuda101/jaxlib-0.1.68+cuda101-cp36-none-manylinux2010_x86_64.whl

pip install jaxlib-0.1.68+cuda101-cp36-none-manylinux2010_x86_64.whl

rm jaxlib-0.1.68+cuda101-cp36-none-manylinux2010_x86_64.whl

Install pytorch and related packages for pretrained depth models

conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.1 -c pytorch
pip install timm
pip install opencv-python

Install ffmpeg for composing videos

pip install imageio-ffmpeg

3. Dataset

3.1 Download DTU dataset

Download the DTU dataset from the official website, "Rectified (123 GB)" and "SampleSet (6.3 GB)"
Data: extract "Rectified (123 GB)"
Poses: extract "SampleSet/MVS\ Data/Calibration/cal18/" from "SampleSet (6.3 GB)"
Masks: download masks (used for evaluation only) from this link
Get depth maps, For both LLFF and DTU, please set the variables $root_path, $benchmark, and $dataset_id in get_depth_map.sh, and run:

sh scripts/get_depth_map_for_dtu.sh

3.2 Download LLFF dataset

Download LLFF from the official download link.
Get depth maps, For both LLFF and DTU, please set the variables $root_path, $benchmark, and $dataset_id in get_depth_map.sh, and run:

sh scripts/get_depth_map_for_llff.sh

3.3 Download NVS-RGBD dataset

Download NVS-RGBD from the official website link

4. Training

4.1 Training on LLFF

Please set the variables in scripts/train_llff.sh and configs/llff3.gin, and run:

sh scripts/train_llff.sh

4.2 Training on DTU

Please set the variables in train_dtu3.sh, and run:

sh scripts/train_dtu.sh

4.3 Training on NVS-RGBD

Similar to 4.1 and 4.2. The depth maps are from depth sensors.

sh scripts/train_kinect.sh

sh scripts/train_zed.sh

sh scripts/train_iphone.sh

5. Test

5.1 Evaluation on LLFF

Please set the variables (the same as train_llff3.sh and train_dtu3.sh) in eval_llff3.sh or eval_dtu3, and run:

sh scripts/eval_llff.sh

5.2 Evaluation on DTU

sh scripts/eval_dtu.sh

5.3 Evaluation on NVS-RGBD

sh scripts/eval_kinect.sh

sh scripts/eval_zed.sh

sh scripts/eval_iphone.sh

6 (Optional) Render videos

Please set the variables (the same as train_llff.sh and train_dtu.sh) in render_llff.sh or render_dtu.sh, and run.

6.1 Render videos on LLFF

sh scripts/render_llff.sh

6.2 Render videos on DTU

sh scripts/render_dtu.sh

6.3 Render videos on NVS-RGBD

sh scripts/render_kinect.sh

sh scripts/render_zed.sh

sh scripts/render_iphone.sh

7 (Optional) Compose videos

Please set the variables in generate_video_llff.sh or other scripts, and run.

7.1 Compose videos on LLFF

sh generate_video_llff.sh

7.2 Compose videos on DTU

sh generate_video_dtu.sh

7.3 Compose videos on NVS-RGBD

sh generate_video_kinect.sh

sh generate_video_zed.sh

sh generate_video_iphone.sh

8 (Optional) Tensorboard for visualizing training if necessary.

tensorboard --logdir=./out/xxx/ --port=6006

If it raises errors, see Q2 of FQA

9. Citation

If you find this useful for your research, please cite the our paper.

@inproceedings{wang2022sparsenerf,
   author    = {Wang, Guangcong and Chen, Zhaoxi and Loy, Chen Change and Liu, Ziwei},
   title     = {SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis},
   booktitle = {IEEE/CVF International Conference
on Computer Vision (ICCV)},   
   year      = {2023},
  }

Guangcong Wang, Zhaoxi Chen, Chen Change Loy, and Ziwei Liu. SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis, IEEE/CVF International Conference on Computer Vision (ICCV) 2023.