dadadadawjb / NeRFHowTo

NeRF with clean and well-annotated PyTorch re-implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NeRFHowTo

A vanilla Neural Radiance Fields(NeRF) re-implementation with clean and well-annotated PyTorch implementation for systematic learning toward Neural Rendering with Neural Fields method.

NeRF pipeline:

pipeline

Given a camera pose, generate rays by ray casting, then sample points in each ray by ray sampling, then embed the point coordinates by positional encoding, then feed into the MLP as neural radiance field, at last form the output maps from the outputs by ray marching.

Note:

ray casting and ray marching here are not the exactly same as what we commonly refer to in computer graphics, which are commonly compared with ray tracing

the real forward pipeline is not so direct as described here, you can see models/pipeline.py for details

Support Features

  • Clean modularization and decoupling for the five main modules in NeRF, or in neural rendering with neural fields method
  • Dataset-train-test-taste complete pipeline
  • Faithful re-implementation of original description, with messy things removed and very little bit of optimization added
  • Extensibility is provided for using custom datasets, modules and tasting camera settings
  • Several common datasets examples with not-too-bad results

Get Started

# install dependencies
pip install -r requirements.txt

# prepare your dataset in `data`

# prepare your experiment configuration in `configs`

# observe your dataset
python observe_dataset.py --config configs/*.txt

# train the field
python train.py --config configs/*.txt

# visualize the train process if needed
python train_visualize.py --path logs/* --gif --mp4 --fps 30

# test the field
python test.py --config configs/*.txt

# visualize the test results if needed
python test_visualize.py --path logs/* --gif --mp4 --fps 10

# taste with the field
python taste.py --config configs/*.txt

# visualize the taste results if needed
python taste_visualize.py --path logs/* --taste_type <your_type> --generate_type <your_type> --gif --mp4 --fps 30

Demo

Here I use the commonly-used "hello world" dataset in NeRF field and prepare it with a demo configuration file that can run on your personal normal laptop.

By using python observe_dataset.py --config configs/helloworld_demo.txt, you can see the dataset distribution.

demo-dataset

By using python train.py --config configs/helloworld_demo.txt and then visualizing it, you can see the training process. And it only takes use of around 2GB GPU memory and 30min time to train, you can enjoy it easily.

demo-train

demo-train-loss demo-val-psnr

By using python test.py --config configs/helloworld_demo.txt and then visualizing it, you can see the testing dataset's rendering results. And it gets a result of around 24 psnr.

demo-test

By using python taste.py --config configs/helloworld_demo.txt and then visualizing them, you can play with different camera settings.

demo-taste-surround-rgb demo-taste-surround-d

demo-taste-spiral-rgb demo-taste-spiral-d

demo-taste-fxy-rgb demo-taste-fxy-d

Analysis

  • advantages: almost all hyperparameters are half of the normal setting so the time-complexity and space-complexity is small enough
  • disadvantages: complexity is small so not very good results

Results

Dataset helloworld

helloworld dataset type can refer to data/README.md. Training takes around 30GB GPU memory and around 7h time (you may need an A100 :), and testing results achieve around 27 psnr.

helloworld-dataset

helloworld-train

helloworld-train-loss helloworld-val-psnr

helloworld-test

helloworld-surround-rgb helloworld-surround-d

helloworld-spiral-rgb helloworld-spiral-d

helloworld-fxy-rgb helloworld-fxy-d

Analysis

  • advantages: standard configuration achieves good results
  • disadvantages: helloworld dataset's images' resolution is low itself so the detail texture can be hard to learn

Dataset blender

blender dataset type can refer to data/README.md, here I take lego. Training takes around 30GB GPU memory and around 21h time (you may need an A100 :), and testing results achieve around 30 psnr.

lego-dataset

lego-train

(sorry the metrics curves are lost due to long time training, but from the results I think it is good enough and no need to look them at details)

lego-test

lego-surround-rgb lego-surround-d

lego-spiral-rgb lego-spiral-d

lego-fxy-rgb lego-fxy-d

Analysis

  • advantages: lego dataset's images' resolution is high enough and the size of dataset is large enough so the detailed texture is clear enough, the dataset is synthetic with no background and accurate geometry correspondence so the learned field is nice

Dataset llff

llff dataset type can refer to data/README.md, here I take fern. Training takes around 30GB GPU memory and around 7h time (you may need an A100 :), and testing results achieve around 21 psnr.

fern-dataset

fern-train

fern-train-loss fern-val-psnr

fern-test

fern-circle-rgb fern-circle-d

fern-fxy-rgb fern-fxy-d

Analysis

  • disadvantages: from last results we can see it is hard to optimize maybe due to few training sets, when decrease the focal length we see many ghosts outside the focus area so it is useful to check whether the field is nice enough, original implementation here adds noise but for clear I remove it so it causes hardness to optimize

Dataset colmap

colmap dataset type can refer to data/README.md, here I take stuff by myself. Training takes around 25GB GPU memory and around 20h time (you may need an A100 :), and testing results achieve around 28 psnr.

stuff-dataset

stuff-train

stuff-train-loss stuff-val-psnr

stuff-test

stuff-surround-rgb stuff-surround-d

stuff-spiral-rgb stuff-spiral-d

stuff-fxy-rgb stuff-fxy-d

Analysis

  • advantages: laptop power facemask apple pear mouse cup glasses pen are all visible
  • disadvantages: since background's depth may be exceed the sampling region I set so it causes hardness to optimize, since background's depth large then the sampling region I set is also large so it causes details hard to be sampled and it can be seen from thin things such as glasses feet

Note

Kudos to the authors for their amazing results.

@misc{mildenhall2020nerf,
    title={NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis},
    author={Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng},
    year={2020},
    eprint={2003.08934},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

I have used the popular pytorch implementations from nerf-pytorch and colab notebook nerf from nothing as references.

@misc{lin2020nerfpytorch,
  title={NeRF-pytorch},
  author={Yen-Chen, Lin},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished={\url{https://github.com/yenchenlin/nerf-pytorch/}},
  year={2020}
}
@misc{nerffromnothing,
  title={NeRF From Nothing},
  author={Mason McGough},
  publisher = {Google},
  journal = {Google colab},
  howpublished={\url{https://colab.research.google.com/drive/1TppdSsLz8uKoNwqJqDGg8se8BHQcvg_K?usp=sharing}},
  year={2022}
}

About

NeRF with clean and well-annotated PyTorch re-implementation

License:MIT License


Languages

Language:Python 100.0%