A "from scratch" re-implementation of 3D Gaussian Splatting for Real-Time Radiance Field Rendering by Kerbl and Kopanas et al.
This repository implements the forward and backwards passes using a PyTorch CUDA extension based on the algorithms descriped in the paper. Some details of the splatting and adaptive control algorithm are not explicitly described in the paper and there may be differences between this repo and the official implementation.
The forward and backward pass algorithms are detailed in MATH.md
Evaluations done with the Mip-NeRF 360 dataset at ~1 megapixel resoloution. This corresponds to the 2x downsampled indoor scenes and 4x downsampled outdoor scenes. Every 8th image was used for the test split.
Here are some comparisons with the with the official implementation (copied from "Per-Scene Error Metrics").
Method | Dataset | PSNR | SSIM | N Gaussians | Train Duration* |
---|---|---|---|---|---|
Official-30k | Garden 1/4x | 27.41 | 0.87 | ~35-45min (A6000) | |
Ours-30k | Garden 1/4x | 26.86 | 0.85 | 2.78M | ~21min (RTX4090) |
Official-7k | Garden 1/4x | 26.24 | 0.83 | ||
Ours-7k | Garden 1/4x | 25.80 | 0.80 | 1.61M | ~3min (RTX4090) |
Official-30k | Counter 1/2x | 28.70 | 0.91 | ||
Ours-30k | Counter 1/2x | 28.60 | 0.90 | 2.01M | ~26min (RTX4090) |
Official-7k | Counter 1/2x | 26.70 | 0.87 | ||
Ours-7k | Counter 1/2x | 27.42 | 0.89 | 1.40M | ~5min (RTX4090) |
Official-30k | Bonsai 1/2x | 31.98 | 0.94 | ||
Ours-30k | Bonsai 1/2x | 31.45 | 0.94 | 0.84M | ~18min (RTX4090) |
Official-7k | Bonsai 1/2x | 28.85 | 0.91 | ||
Ours-7k | Bonsai 1/2x | 29.98 | 0.93 | 1.16M | ~4min (RTX4090) |
Official-30k | Room 1/2x | 30.63 | 0.91 | ||
Ours-30k | Room 1/2x | 31.52 | 0.92 | 1.84M | ~21min (RTX4090) |
Official-7k | Room 1/2x | 28.14 | 0.88 | ||
Ours-7k | Room 1/2x | 29.13 | 0.90 | 1.01M | ~3min (RTX4090) |
*The training time is not directly comparable between the different GPUs. The RTX4090 is faster than the A6000. The training speed between the two methods should be similar.
A comparison from one of the test images in the garden
dataset. The official implementation image appears to be more saturated since the image is extracted from the published pdf. The branch in the exploded view and the wall is reconstructed more crisply in our implementation but the official implementation performs better on the trees and bushes.
The gradient computation kernels are currently templated to enable float64
tensors which are required to use torch.autograd.gradcheck
. All of the backward passes have gradcheck unit test coverage and should be computing the correct gradients for the corresponding forward pass. Additionally, the templated kernels do not allow for float2/3/4
types which could improve performance with better memory alignment.
The discrepancy in PSNR are most likely due to differences in the adaptive control algorithm and tuning.
This package requires CUDA which can be installed from here.
- Install Python dependencies
pip install -r requirements.txt
- Install the PyTorch CUDA extension
pip install -e ./
Note:
- Windows systems may need modify compilation flags in
setup.py
- This step may be sensitive to the version of
pip
. This step failed after upgrading from23.0.1
to23.3.2
- If
pip install
fails, this may work:
python setup.py build_ext && python setup.py install
Optional:
This project uses clang-format
to lint the C++/CUDA files:
sudo apt install clang-format
Running lint.sh
will run both black
and clang-format
.
- Download the Mip-NeRF 360 dataset and unzip
wget http://storage.googleapis.com/gresearch/refraw360/360_v2.zip && unzip 360_v2.zip
-
Update the
DATASET_PATH
insplat_py/constants.py
togarden
withDOWNSAMPLE_FACTOR = 4
-
Run
colmap_splat.py
To run all unit tests:
python -m unittest discover test