wahn / rs_pbrt

Rust crate to implement a counterpart to the PBRT book's (3rd edition) C++ code. See also https://www.rs-pbrt.org/about ...

Home Page:https://www.rs-pbrt.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stochastic Progressive Photon Mapping (SPPM)

wahn opened this issue · comments

Look at chapter 16.2 of the Physically Based Rendering book for the theoretical background on this topic and a simple test scene for caustics. Let's use that scene to see caustic from light passing through the glass becoming increasingly sharper.

Here the results rendered by the C++ version (using 10 iterations):

> grep sppm f16-9a.pbrt
Integrator "sppm" "integer numiterations" [10] "float radius" .025

f16-9a

Here the results rendered by the C++ version (using 100 iterations):

> grep sppm f16-9b.pbrt
Integrator "sppm" "integer numiterations" [100] "float radius" .075

f16-9b

Here the results rendered by the C++ version (using 10.000 iterations):

> grep sppm f16-9c.pbrt
Integrator "sppm" "integer numiterations" [10000] "float radius" .075

f16-9c

There are two options (after commit 5f108ce):

  1. Proceed with render_sppm(...)
  2. Invest some time in profiling

Some arguments for option 2:

The C++ (release) version takes about 11 seconds to render (multi-threaded):

> time ~/builds/pbrt/release/pbrt f16-9a.pbrt
pbrt version 3 (built Dec  4 2018 at 10:10:06) [Detected 8 cores]
...
73.535u 2.423s 0:10.54 720.5%	0+0k 13536+5816io 36pf+0w

The single-threaded (release) version still needs only about 1m5s to render:

> time ~/builds/pbrt/release/pbrt --nthreads 1 f16-9a.pbrt
pbrt version 3 (built Dec  4 2018 at 10:10:06) [Detected 8 cores]
...
63.870u 0.716s 1:04.95 99.4%	0+0k 0+5824io 0pf+0w

I added some progress bars to the current (single-threaded) Rust executable because the current code runs far too long:

> ~/git/github/rs_pbrt/target/release/examples/rs_pbrt -i f16-9a.pbrt
pbrt version 0.5.0 [Detected 8 cores]
...
Generate SPPM visible points ...
9 / 63 [=========================>------------------------------------------------------------------------------------------------------------------------------------------------------------] 14.29 % 0.04/s 22m 
...

That's the estimate after running the program for about 4 minutes. I did previous runs where it finished that phase but it took over 15 minutes and those stages are repeated many times. So, yes, maybe it's time to investigate via the perf profiler.

Here a link from a previous discussion about Profilers and how to interprete results on recursive functions ...

After about 15 minutes:

35 / 63 [===========================================================================================================================================>----------------------------------------------------------------------------------------------------------------] 55.56 % 0.04/s 12m 
> procs pbrt ; date
 PID   User           | State TTY   CPU   MEM VSZ      RSS      TCP UDP Read  Write | CPU Time Start            | Command                                                                                                                                                                 
                      |             [%]   [%] [bytes]  [bytes]          [B/s] [B/s] |                           |                                                                                                                                                                         
 4965  jan            | R     pts/8 29.0  0.0 18.809M  2.777M   []  []  0     0     | 00:00:00 2019/02/01 10:28 | procs pbrt                                                                                                                                                              
 30775 jan            | R     pts/7 102.7 0.6 328.332M 311.633M []  []  0     0     | 00:15:02 2019/02/01 10:13 | /usr/people/jan/git/github/rs_pbrt/target/release/examples/rs_pbrt -i f16-9a.pbrt                                                                                       
Fri  1 Feb 10:28:30 GMT 2019

I'm using perf on another machine (where it is installed already).

Tell cargo that we’re going to need debugging symbols:

$ pwd
/home/jan/git/self_hosted/Rust/pbrt
$ export RUSTFLAGS='-g'
$ make clobber
$ make

We need to change something (as root) before we can run perf as user:

$ pwd
/home/jan/Graphics/Rendering/PBRT/pbrt-v3-scenes/caustic-glass
$ perf record --call-graph=lbr ~/git/self_hosted/Rust/pbrt/target/release/examples/rs_pbrt -i f16-9a.pbrt
perf_event_open(..., PERF_FLAG_FD_CLOEXEC) failed with unexpected error 13 (Permission denied)
perf_event_open(..., 0) failed unexpectedly with error 13 (Permission denied)
Error:
You may not have permission to collect stats.

Consider tweaking /proc/sys/kernel/perf_event_paranoid,
which controls use of the performance events system by
unprivileged users (without CAP_SYS_ADMIN).

The current value is 3:

  -1: Allow use of (almost) all events by all users
      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
      Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
>= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN

To make this setting permanent, edit /etc/sysctl.conf too, e.g.:

        kernel.perf_event_paranoid = -1

As root:

# cat /proc/sys/kernel/perf_event_paranoid
3
# echo "-1" > /proc/sys/kernel/perf_event_paranoid
# cat /proc/sys/kernel/perf_event_paranoid
-1

After approx. one minute:

$ perf record --call-graph=lbr ~/git/self_hosted/Rust/pbrt/target/release/examples/rs_pbrt -i f16-9a.pbrt
$ perf report
+   99.31%     0.00%  rs_pbrt  rs_pbrt             [.] pbrt::core::api::pbrt_cleanup                                                                                                                                                                                                     ◆
+   99.18%     0.11%  rs_pbrt  rs_pbrt             [.] pbrt::integrators::sppm::render_sppm                                                                                                                                                                                              ▒
+   97.84%     0.03%  rs_pbrt  rs_pbrt             [.] <pbrt::samplers::halton::HaltonSampler as core::clone::Clone>::clone                                                                                                                                                              ▒
-   97.71%    97.42%  rs_pbrt  rs_pbrt             [.] <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T, I>>::from_iter                                                                                                                                                                   ▒
     97.41% pbrt::core::api::pbrt_cleanup                                                                                                                                                                                                                                                ▒
      - pbrt::integrators::sppm::render_sppm                                                                                                                                                                                                                                             ▒
         - 97.41% <pbrt::samplers::halton::HaltonSampler as core::clone::Clone>::clone                                                                                                                                                                                                   ▒
              <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T, I>>::from_iter

Let's see what happens if we implement the HaltonSampler::set_sample_number() method:

impl GlobalSampler for HaltonSampler {                                                                                                      
    fn set_sample_number(&mut self, sample_num: i64) -> bool {                                                                              
        // TODO                                                                                                                             
        false                                                                                                                               
    }                                                                                                                                       
} 

After commit e71b416 we should be able to render something using the Rust version of SPPMIntegrator. It might take very long to render, not only because the Rust code is currently not multi-threaded, but also because we have to use the perf profiler to identify other bottlenecks.

Anyway, another task to do is to write a SPPM radius image, if requested, as visible here:

sppm_radius

The C++ code produces the image above (beside the normal rendered image) if the environment variable SPPM_RADIUS is set:

> setenv SPPM_RADIUS
> ~/builds/pbrt/release/pbrt f16-9a.pbrt
> display sppm_radius.png

After commit a7fc9bc:

> ~/git/github/rs_pbrt/target/release/examples/rs_pbrt -i f16-9a.pbrt
...
Trace photons and accumulate contributions ...
127 / 700000 [>---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 0.02 % 26237.14/s 27s 
thread 'main' panicked at 'index out of bounds: the len is 700000 but the index is 18446744073709340480', /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b/src/libcore/slice/mod.rs:2463:10
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Something for tomorrow to debug ...

After commit fe5ae69 we can render the first images:

issue_86_first_images

The trick being used here is that you can tell the renderer to write an image for each iteration. Currently the Rust code returns after the second iteration most likely here:

pub fn render_sppm(...
) {
...
                        if pdf_pos == 0.0 as Float || pdf_dir == 0.0 as Float || le.is_black() {
                            return;
                        }
                        let mut beta: Spectrum = (le * nrm_abs_dot_vec3(&n_light, &photon_ray.d))
                            / (light_pdf * pdf_pos * pdf_dir);
                        if beta.is_black() {
                            return;
                        }
...
}

This is how you specify that you want an image being written after each iteration:

> diff f16-9a.pbrt imagewritefrequency.pbrt
12c12
< Integrator "sppm" "integer numiterations" [10] "float radius" .025
---
> Integrator "sppm" "integer numiterations" [10] "float radius" .025 "integer imagewritefrequency" [1]

During the third iteration Rust returns from render_sppm(...):

> ~/git/github/rs_pbrt/target/release/examples/rs_pbrt -i imagewritefrequency.pbrt
...
Generate SPPM visible points ...
...
Generate SPPM visible points ...
...
Generate SPPM visible points ...
...
Trace photons and accumulate contributions ...
194323 / 700000 [==============================================>--------------------------------------------------------------------------------------------------------------------------] 27.76 % 20130.49/s 25s 
pdf_pos = 1, pdf_dir = 1.1879485, le = RGBSpectrum { c: [0.0, 0.0, 0.0] }

The last message was created by the first println!(...) line I added:

> git diff
diff --git a/src/integrators/sppm.rs b/src/integrators/sppm.rs
index b7ff651..2364f80 100644
--- a/src/integrators/sppm.rs
+++ b/src/integrators/sppm.rs
@@ -451,11 +451,13 @@ pub fn render_sppm(
                             &mut pdf_dir,
                         );
                         if pdf_pos == 0.0 as Float || pdf_dir == 0.0 as Float || le.is_black() {
+                            println!("pdf_pos = {}, pdf_dir = {}, le = {:?}", pdf_pos, pdf_dir, le);
                             return;
                         }
                         let mut beta: Spectrum = (le * nrm_abs_dot_vec3(&n_light, &photon_ray.d))
                             / (light_pdf * pdf_pos * pdf_dir);
                         if beta.is_black() {
+                            println!("beta = {:?}", beta);
                             return;
                         }
                         // follow photon path through scene and record intersections
``

Actually it's light[0] which returns the black pixel:

light[0]: pdf_pos = 1, pdf_dir = 1.1879485, le = RGBSpectrum { c: [0.0, 0.0, 0.0] }

If you compare the images per iteration with the C++ version you can see that something (most likely the contribution of the photons) is missing in the two images rendered by Rust (above):

cpp_filmstrip_01
cpp_filmstrip_02

Something for tomorrow to debug ...

After commit 863cbed the time to clone the HaltonSampler was drastically reduced:

$ perf record --call-graph=lbr ~/git/self_hosted/Rust/pbrt/target/release/examples/rs_pbrt -i imagewritefrequency.pbrt
$ perf report
-   97.28%     0.79%  rs_pbrt  rs_pbrt                     [.] pbrt::integrators::sppm::render_sppm                                                                                                                                                                                      ▒
   - 96.50% pbrt::integrators::sppm::render_sppm                                                                                                                                                                                                                                         ▒
      + 80.80% <pbr::pb::ProgressBar<T>>::inc                                                                                                                                                                                                                                            ▒
      + 7.38% <pbrt::accelerators::bvh::BVHAccel as pbrt::core::primitive::Primitive>::intersect                                                                                                                                                                                         ▒
      + 3.00% pbrt::core::integrator::uniform_sample_one_light                                                                                                                                                                                                                           ▒
      + 1.11% pbrt::core::primitive::Primitive::compute_scattering_functions                                                                                                                                                                                                             ▒
        0.54% <pbrt::samplers::halton::HaltonSampler as core::clone::Clone>::clone                                                                                                                                                                                                       ▒
   + 0.74% pbrt::core::api::pbrt_cleanup 

Basically the C++ code shares a static vector<uint16_t> for all instances of the class HaltonSampler:

class HaltonSampler : public GlobalSampler {                                                                                                
...
  private:                                                                                                                                  
    // HaltonSampler Private Data                                                                                                           
    static std::vector<uint16_t> radicalInversePermutations;                                                                                
...
};

On the Rust side you can achieve the same effect by using the lazy_static crate:

/// Generate random digit permutations for Halton sampler
lazy_static! {
    #[derive(Debug)]
    static ref RADICAL_INVERSE_PERMUTATIONS: Vec<u16> = {
        let mut rng: Rng = Rng::new();
        let radical_inverse_permutations: Vec<u16> = compute_radical_inverse_permutations(&mut rng);
        radical_inverse_permutations
    };
}

After commit 069a859 we start to see some contribution from the photons, but we have to rethink the SPPMPixelListNode:

pbrt

After commit a7c385d the Rust iterations start to look good:

rust_filmstrip_02

Nevertheless the iterations stop early:

> ~/git/github/rs_pbrt/target/release/examples/rs_pbrt -i imagewritefrequency.pbrt
pbrt version 0.5.0 [Detected 8 cores]
Copyright (c) 2016-2019 Jan Douglas Bert Walter.
Rust code based on C++ code by Matt Pharr, Greg Humphreys, and Wenzel Jakob.
Film "image"
  "string filename" ["f16-9a.exr"]
  "integer xresolution" [700]
  "integer yresolution" [1000]
  "float scale" [1.5]
Integrator "sppm"
  "integer numiterations" [10]
  "integer imagewritefrequency" [1]
  "float radius" [0.025]
WORK: CreateSPPMIntegrator
BVHAccel::recursive_build(..., 88066, ...)
PT0.466146340S seconds for building BVH ...
BVHAccel::flatten_bvh_tree(...)
PT0.005954928S seconds for flattening BVH ...
Generate SPPM visible points ...
63 / 63 [======================================================================================================================================================================================] 100.00 % 10.86/s  
Compute grid bounds for SPPM visible points ...
Add visible points to SPPM grid ...
Trace photons and accumulate contributions ...
Update pixel values from this pass's photons ...
Writing image "pbrt.png" with bounds Bounds2 { p_min: Point2 { x: 0, y: 0 }, p_max: Point2 { x: 700, y: 1000 } }
Generate SPPM visible points ...
63 / 63 [======================================================================================================================================================================================] 100.00 % 11.24/s  
Compute grid bounds for SPPM visible points ...
Add visible points to SPPM grid ...
Trace photons and accumulate contributions ...
Update pixel values from this pass's photons ...
Writing image "pbrt.png" with bounds Bounds2 { p_min: Point2 { x: 0, y: 0 }, p_max: Point2 { x: 700, y: 1000 } }
Generate SPPM visible points ...
63 / 63 [======================================================================================================================================================================================] 100.00 % 11.43/s  
Compute grid bounds for SPPM visible points ...
Add visible points to SPPM grid ...
Trace photons and accumulate contributions ...
light[0]: pdf_pos = 1, pdf_dir = 1.1879485, le = RGBSpectrum { c: [0.0, 0.0, 0.0] }

Something to debug ...

It looks like C++ does return at that point as well:

(gdb) info b
info b
Num    	stop only if iter == 2 && haltonIndex == 1594322
1      	breakpoint already hit 1 time00000007bb99b in pbrt::SPPMIntegrator::__lambda6::operator()(int) const at /usr/people/jan/git/github/pbrt-v3/src/integrators/sppm.cpp:339
(gdb) p Le
$1 = {<pbrt::CoefficientSpectrum<3>> = {static nSamples = <optimized out>, c = {0, 0, 0}}, <No data fields>}

After commit 549d950 the test scene (f16-9a.pbrt) renders now more or less the same (C++ vs. Rust):

> imf_diff -d -f f16-9a.exr pbrt_rust.exr diff.jpg
differing pixels:	 11.909% (83361 of 700000)
average difference:	  3.269%
maximum difference:	 21.964%
Summary: Many pixels differ.
== "f16-9a.exr" and "pbrt_rust.exr" are different

diff

Using 100 iterations works as well, but I think it's time to use multi-threading:

f16-9b_rust

Here the difference:

> imf_diff -d -f f16-9b.exr pbrt_rust.exr diff.jpg
differing pixels:	 12.997% (90976 of 700000)
average difference:	  2.466%
maximum difference:	  7.955%
Summary: Many pixels differ slightly.
== "f16-9b.exr" and "pbrt_rust.exr" are different

diff

And here the Rust version using 10.000 iterations (still rendered without multi-threading):

f16-9c_rust

After commit 0a7a2c8 all four phases for each iteration are multi-threaded, but it looks like there are situations where several threads can interfere and cause a panic:

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src/libcore/option.rs:355:21
note: Run with `RUST_BACKTRACE=1` for a backtrace.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any', src/libcore/result.rs:1009:5

Let's try to avoid that situation by rethinking how the entries (linked lists) for e.g. grid[h] are created (I assume that works thread save) and afterwards used ... The C++ code looks like this:

void SPPMIntegrator::Render(const Scene &scene) {                                           
...
    for (int iter = 0; iter < nIterations; ++iter) {                                        
...
        // Generate SPPM visible points
...
        // Create grid of all SPPM visible points                                           
...
                    for (int z = pMin.z; z <= pMax.z; ++z)                                  
                        for (int y = pMin.y; y <= pMax.y; ++y)                              
                            for (int x = pMin.x; x <= pMax.x; ++x) {                        
                                // Add visible point to grid cell $(x, y, z)$               
                                int h = hash(Point3i(x, y, z), hashSize);                   
                                SPPMPixelListNode *node =                                   
                                    arena.Alloc<SPPMPixelListNode>();                       
                                node->pixel = &pixel;                                       
                                                                                            
                                // Atomically add _node_ to the start of                    
                                // _grid[h]_'s linked list                                  
                                node->next = grid[h];                                       
                                while (grid[h].compare_exchange_weak(                       
                                           node->next, node) == false)                      
                                    ;                                                       
                            }                                                               
...
        // Trace photons and accumulate contributions                                       
...
                            // Add photon contribution to visible points in                 
                            // _grid[h]_                                                    
                            for (SPPMPixelListNode *node =                                  
                                     grid[h].load(std::memory_order_relaxed);               
                                 node != nullptr; node = node->next) {                      
                                ++visiblePointsChecked;                                     
                                SPPMPixel &pixel = *node->pixel;                            
                                Float radius = pixel.radius;                                
                                if (DistanceSquared(pixel.vp.p, isect.p) >                  
                                    radius * radius)                                        
                                    continue;                                               
                                // Update _pixel_ $\Phi$ and $M$ for nearby                 
                                // photon                                                   
                                Vector3f wi = -photonRay.d;                                 
                                Spectrum Phi =                                              
                                    beta * pixel.vp.bsdf->f(pixel.vp.wo, wi);               
                                for (int i = 0; i < Spectrum::nSamples; ++i)                
                                    pixel.Phi[i].Add(Phi[i]);                               
                                ++pixel.M;                                                  
...
        // Update pixel values from this pass's photons                                     
...
        // Periodically store SPPM image in film and write image                            
...
    }                                                                                       
    progress.Done();                                                                        
}                                                                                           

After commit da87b98 all four phases should be thread-safe.

Closing the issue ...