dreamworksanimation / openmoonray

MoonRay is DreamWorks’ open-source, award-winning, state-of-the-art production MCRT renderer.

Home Page:https://openmoonray.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crash with example scene glass-of-water when increasing max_depth and max_mirror_depth

rujialiu opened this issue · comments

First, a big thank you for the awesome project!

I've successfully built openmoonray with docker (Host: windows 10 + hyper-V) and rendered all 10 example scenes. However, I noticed that the rendering of glass-of-water is darker than from both the original collection (by Tsungsten) and the render by pbrt.

I think the main reason is that the max depth is only 8 instead of 33 in the pbrt scene. However, after I changed both max_depth and max_mirror_depth to 32 (I kept max_glossy_depth to 8), It crashed some time even I only render a very small (1/8 resolution) picture.

Rendering with high max_mirror_depth is critical for us because we have some scenes with a lot of hanging glasses in the kitchen. In some cases some camera rays can indeed do 30+ specular transmissions (validated with other renderers by tuning max depth)

PS: I somewhat failed to built XPU support (still need to investigate), so the renderer uses vectorized mode as seen in the log below.

[root@docker-desktop glass-of-water]# moonray -in scene.rdla -in scene.rdlb -out /tmp/glass-of-water.exr -res 8.0 -debug
Setting mPerThreadRayStatePoolSize to 65536
Setting mPrimaryRayQueueSize to 512
Setting mIncoherentRayQueueSize to 1024
Setting mOcclusionQueueSize to 1024
Setting mShadeQueueSize to 128
Setting mRadianceQueueSize to 512
Setting mShadingWorkloadChunkSize to 32
Setting mPresenceShadowsQueueSize to 1024
Time      Memory
00:00:00  479.2 MB | ---------- Hardware Configuration ------------------------
00:00:00  479.2 MB | Host name                        = docker-desktop
00:00:00  479.2 MB | Number of machines               = 1
00:00:00  479.2 MB | Cluster machine id               = 0
00:00:00  479.2 MB | Thread(s)                        = 28
00:00:00  479.2 MB | ---------- Initialization & Configuration ----------------
00:00:00  479.2 MB | Using OpenImageIO Texture System
00:00:00  479.2 MB | Loading Scene File: scene.rdla
00:00:00  479.2 MB | Loading DwaRefractiveMaterial("/scene/materials/WaterAir/0x214aec0")
00:00:00  479.2 MB | Loading DwaRefractiveMaterial("/scene/materials/IceAir/0x2151d80")
00:00:00  479.2 MB | Loading DwaRefractiveMaterial("/scene/materials/AirIce/0x214a2f0")
00:00:00  479.2 MB | Loading DwaMetalMaterial("/scene/materials/Backdrop/0x21a2490")
00:00:00  479.2 MB | Loading DwaMetalMaterial("/scene/materials/Floor/0x21a2870")
00:00:00  479.2 MB | Loading DwaRefractiveMaterial("/scene/materials/Glass/0x21a25f0")
00:00:00  479.2 MB | Loading Scene File: scene.rdlb
00:00:00  479.2 MB | Overriding 'output file' value with '/tmp/glass-of-water.exr'.
00:00:00  479.2 MB | Overriding 'debug' value with 'true'.
00:00:00  479.2 MB | Overriding 'res' value with '8.0'.
00:00:00  479.2 MB | Read disk I/O = 30.4 MB

00:00:00  479.2 MB | ---------- Hardware Support ------------------------------
00:00:00  479.2 MB | CPU vendor tag                   = AuthenticAMD
00:00:00  479.2 MB | Atomic   8 bit support           = 1
00:00:00  479.2 MB | Atomic  16 bit support           = 1
00:00:00  479.2 MB | Atomic  32 bit support           = 1
00:00:00  479.2 MB | Atomic  64 bit support           = 1
00:00:00  479.2 MB | Atomic 128 bit support           = 1
00:00:00  479.2 MB | SSE support                      = 1
00:00:00  479.2 MB | SSE2 support                     = 1
00:00:00  479.2 MB | SSE3 support                     = 1
00:00:00  479.2 MB | SSE4.1 support                   = 1
00:00:00  479.2 MB | SSE4.2 support                   = 1
00:00:00  479.2 MB | AVX support                      = 1
00:00:00  479.2 MB | AVX2 support                     = 1
00:00:00  479.2 MB | AVX512 support                   = 0
00:00:00  479.2 MB | ---------- Rendering Options -----------------------------
00:00:00  479.2 MB | Command line                     = moonray -in scene.rdla -in scene.rdlb -out /tmp/glass-of-water.exr -res 8.0 -debug
00:00:00  479.2 MB | Executable path                  = /installs/openmoonray/bin/moonray
00:00:00  479.2 MB | Moonray version                  = unknown
00:00:00  479.2 MB | DSO path override                =
00:00:00  479.2 MB | Desired vectorization mode       = AUTO
00:00:00  479.2 MB | SIMD build support               = avx2
00:00:00  479.2 MB | Athena Tags                      =
00:00:00  479.2 MB | Scene file                       = /scenes/glass-of-water/scene.rdla
00:00:00  479.2 MB | Scene file                       = /scenes/glass-of-water/scene.rdlb
00:00:00  479.2 MB | ---------- Scene Variables -------------------------------
00:00:00  479.2 MB | Width                            = 1280
00:00:00  479.2 MB | Height                           = 720
00:00:00  479.2 MB | Resolution                       = 8.00
00:00:00  479.2 MB | Final width                      = 160
00:00:00  479.2 MB | Final height                     = 90
00:00:00  479.2 MB | Final aperture window min x      = 0
00:00:00  479.2 MB | Final aperture window min y      = 0
00:00:00  479.2 MB | Final aperture window max x      = 160
00:00:00  479.2 MB | Final aperture window max y      = 90
00:00:00  479.2 MB | Final region window min x        = 0
00:00:00  479.2 MB | Final region window min y        = 0
00:00:00  479.2 MB | Final region window max x        = 160
00:00:00  479.2 MB | Final region window max y        = 90
00:00:00  479.2 MB | Pixel filter type                = cubicBSpline
00:00:00  479.2 MB | Pixel filter width               = 3
00:00:00  479.2 MB | Texture blur                     = 0
00:00:00  479.2 MB | Output file                      = /tmp/glass-of-water.exr
00:00:00  479.2 MB | Stats file                       =
00:00:00  479.2 MB | DSO path                         = .:/installs/openmoonray/scripts/../rdl2dso:/installs/openmoonray/scripts/../rdl2dso.proxy
00:00:00  479.2 MB | Camera                           = /scene/cameras/PerspectiveCamera_1
00:00:00  479.2 MB | Layer                            = /scene/layers/Layer1
00:00:00  479.2 MB | Debug ray file                   =
00:00:00  479.2 MB | texture_cache_size               = 4000
00:00:00  479.2 MB | Progressive                      = 0
00:00:00  479.2 MB | ---------- Sampling Settings -----------------------------
00:00:00  479.2 MB | Sampling mode                    = 2
00:00:00  479.2 MB | Min adaptive samples             = 16
00:00:00  479.2 MB | Max adaptive samples             = 1024
00:00:00  479.2 MB | Target adaptive error            = 10.00000
00:00:00  479.2 MB | Pixel samples sqrt               = 8
00:00:00  479.2 MB | Light samples sqrt               = 2
00:00:00  479.2 MB | Bsdf samples sqrt                = 2
00:00:00  479.2 MB | Bsdf sampler strategy            = 0
00:00:00  479.2 MB | Bssrdf samples sqrt              = 2
00:00:00  479.2 MB | Max depth                        = 32
00:00:00  479.2 MB | Max diffuse depth                = 2
00:00:00  479.2 MB | Max glossy depth                 = 8
00:00:00  479.2 MB | Max mirror depth                 = 32
00:00:00  479.2 MB | Max hair depth                   = 5
00:00:00  479.2 MB | Max volume depth                 = 1
00:00:00  479.2 MB | Max presence depth               = 16
00:00:00  479.2 MB | Presence threshold               = 0.99900
00:00:00  479.2 MB | Transparency threshold           = 1.00000
00:00:00  479.2 MB | Max subsurface per path          = 1
00:00:00  479.2 MB | Russian roulette threshold       = 0.03750
00:00:00  479.2 MB | Sample clamping value            = 10.00000
00:00:00  479.2 MB | Sample clamping depth            = 1
00:00:00  479.2 MB | Roughness clamping factor        = 0.00000
00:00:00  479.2 MB | Volume quality                   = 0.50000
00:00:00  479.2 MB | Volume illumination samples      = 4
00:00:00  479.2 MB | Volume opacity threshold         = 0.99500
00:00:00  479.2 MB | ---------- Exec Mode Configuration -----------------------
00:00:00  479.2 MB | Vectorized rendering             = 1
00:00:00  479.2 MB | XPU rendering                    = 0
Executing a vectorized render since execution mode was set to auto.

00:00:00  479.2 MB | ---------- Render Prep -----------------------------------
Updating 7 leaf scene objects...
DEBUG (lib.render): DwaRefractiveMaterial("/scene/materials/WaterAir/0x214aec0"): Updating
DEBUG (lib.render): DwaRefractiveMaterial("/scene/materials/IceAir/0x2151d80"): Updating
DEBUG (lib.render): DwaRefractiveMaterial("/scene/materials/AirIce/0x214a2f0"): Updating
DEBUG (lib.render): DwaMetalMaterial("/scene/materials/Backdrop/0x21a2490"): Updating
DEBUG (lib.render): DwaRefractiveMaterial("/scene/materials/Glass/0x21a25f0"): Updating
DEBUG (lib.render): PerspectiveCamera("/scene/cameras/PerspectiveCamera_1"): Updating
DEBUG (lib.render): DwaMetalMaterial("/scene/materials/Floor/0x21a2870"): Updating
Updating 7 scene objects at level 3...
DEBUG (lib.render): RdlMeshGeometry("/scene/objects/<root>_mesh_5_1"): Updating
DEBUG (lib.render): RdlMeshGeometry("/scene/objects/<root>_mesh_4_1"): Updating
DEBUG (lib.render): RdlMeshGeometry("/scene/objects/<root>_mesh_2_1"): Updating
DEBUG (lib.render): RdlMeshGeometry("/scene/objects/<root>_mesh_0_1"): Updating
DEBUG (lib.render): RdlMeshGeometry("/scene/objects/<root>_mesh_1_1"): Updating
DEBUG (lib.render): RectLight("/scene/arealights/<root>_1"): Updating
DEBUG (lib.render): RdlMeshGeometry("/scene/objects/<root>_mesh_3_1"): Updating
Updating 2 scene objects at level 2...
DEBUG (lib.render): RdlInstancerGeometry("/scene/objects/<root>"): Updating
DEBUG (lib.render): LightSet("/scene/lightsets/LightSet1"): Updating
Updating 1 scene object at level 1...
DEBUG (lib.render): Layer("/scene/layers/Layer1"): Updating
Updating 2 scene objects at level 0...
DEBUG (lib.render): SceneVariables("__SceneVariables__"): Updating
DEBUG (lib.render): GeometrySet("/scene/geometrysets/GeometrySet1"): Updating

00:00:00  479.2 MB | ---------- Generating Procedurals ------------------------
00:00:00  479.2 MB | Generating RdlMeshGeometry("/scene/objects/<root>_mesh_1_1")
00:00:00  479.2 MB | Generating RdlMeshGeometry("/scene/objects/<root>_mesh_3_1")
00:00:00  479.2 MB | Generating RdlMeshGeometry("/scene/objects/<root>_mesh_2_1")
00:00:00  479.2 MB | Generating RdlMeshGeometry("/scene/objects/<root>_mesh_4_1")
00:00:00  479.2 MB | Generating RdlMeshGeometry("/scene/objects/<root>_mesh_0_1")
00:00:00  479.2 MB | Generating RdlMeshGeometry("/scene/objects/<root>_mesh_5_1")
00:00:00  534.8 MB | Generating RdlInstancerGeometry("/scene/objects/<root>")
00:00:00  513.4 MB | ---------- Tessellating Geometry -------------------------
DEBUG (lib.render): 00:00:00  513.4 MB | Thread 0       : START tessellating /scene/objects/<root> generated_mesh
DEBUG (lib.render): 00:00:00  513.4 MB | Thread 1       : START tessellating /scene/objects/<root> generated_mesh
DEBUG (lib.render): 00:00:00  513.4 MB | Thread 4       : START tessellating /scene/objects/<root> generated_mesh
DEBUG (lib.render): 00:00:00  513.4 MB | Thread 2       : START tessellating /scene/objects/<root> generated_mesh
DEBUG (lib.render): 00:00:00  513.4 MB | Thread 3       : START tessellating /scene/objects/<root> generated_mesh
DEBUG (lib.render): 00:00:00  513.4 MB | Thread 5       : START tessellating /scene/objects/<root> generated_mesh
DEBUG (lib.render): 00:00:00  513.4 MB | Thread 4       : FINISHED tessellating /scene/objects/<root> generated_mesh
DEBUG (lib.render): 00:00:00  513.4 MB | Thread 3       : FINISHED tessellating /scene/objects/<root> generated_mesh
DEBUG (lib.render): 00:00:00  525.2 MB | Thread 2       : FINISHED tessellating /scene/objects/<root> generated_mesh
DEBUG (lib.render): 00:00:00  525.2 MB | Thread 1       : FINISHED tessellating /scene/objects/<root> generated_mesh
DEBUG (lib.render): 00:00:00  552.3 MB | Thread 5       : FINISHED tessellating /scene/objects/<root> generated_mesh
DEBUG (lib.render): 00:00:00  552.3 MB | Thread 0       : FINISHED tessellating /scene/objects/<root> generated_mesh
00:00:00  546.6 MB | Tessellation finished.
00:00:00  546.6 MB | ---------- Building BVH ----------------------------------
00:00:00  574.2 MB | BVH build finished.
00:00:00  574.2 MB | lower: -18.520609 -18.665905 -43.116508
00:00:00  574.2 MB | upper: 18.540924 11.660841 -14.778393
00:00:00  574.2 MB | ---------- Tessellating Geometry -------------------------
00:00:00  574.2 MB | Tessellation finished.
00:00:00  574.2 MB | ---------- Tessellation time -----------------------------
00:00:00  574.2 MB | Rdl Geometry          part name      time
00:00:00  574.2 MB | --------------------- -------------- ------------
00:00:00  574.2 MB | /scene/objects/<root> generated_mesh 00:00:00.021
00:00:00  574.2 MB | /scene/objects/<root> generated_mesh 00:00:00.015
00:00:00  574.2 MB | /scene/objects/<root> generated_mesh 00:00:00.004
00:00:00  574.2 MB | /scene/objects/<root> generated_mesh 00:00:00.004
00:00:00  574.2 MB | /scene/objects/<root> generated_mesh 00:00:00.000
00:00:00  574.2 MB | /scene/objects/<root> generated_mesh 00:00:00.000
00:00:00  574.2 MB | ---------- Geometry Memory Usage -------------------------
00:00:00  574.2 MB | MB         Geometry Name
00:00:00  574.2 MB | ---------- ------------------------------
00:00:00  574.2 MB | 36.18      /scene/objects/<root>
00:00:00  574.2 MB | 0.00       /scene/objects/<root>_mesh_0_1
00:00:00  574.2 MB | 0.00       /scene/objects/<root>_mesh_1_1
00:00:00  574.2 MB | 0.00       /scene/objects/<root>_mesh_2_1
00:00:00  574.2 MB | 0.00       /scene/objects/<root>_mesh_3_1
00:00:00  574.2 MB | 0.00       /scene/objects/<root>_mesh_4_1
00:00:00  574.2 MB | 0.00       /scene/objects/<root>_mesh_5_1
00:00:00  574.2 MB | ---------- Memory Summary --------------------------------
00:00:00  574.2 MB | Geometry memory                  = 36.18 MB
00:00:00  574.2 MB | BVH memory                       = 26.61 MB
00:00:00  574.2 MB | Total memory                     = 62.79 MB

00:00:00  574.2 MB | ---------- Render Prep Stats ---- (hh:mm:ss.ms) ----------
00:00:00  574.2 MB | Loading scene                    = 00:00:00.160
00:00:00  574.2 MB | Initialize renderer              = 00:00:00.002
00:00:00  574.2 MB | Generating procedurals           = 00:00:00.026
00:00:00  574.2 MB | Tessellation                     = 00:00:00.043
00:00:00  574.2 MB | Building BVH                     = 00:00:00.077
00:00:00  574.2 MB | Building GPU BVH                 = 00:00:00.000
00:00:00  574.2 MB | -----------------------------------------------
00:00:00  574.2 MB | Total render prep                = 00:00:00.305
00:00:00  574.2 MB | Render prep read disk I/O        = 30.365 MB

00:00:00  574.2 MB | ---------- Dso Usage ----------
00:00:00  574.2 MB | RdlInstancerGeometry          1
00:00:00  574.2 MB | RdlMeshGeometry               6
00:00:00  574.2 MB | RectLight                     1


00:00:00  574.2 MB | ---------- MCRT Rendering --------------------------------
no-extra-snapshot
DEBUG (lib.render): Call to Material::deferEntriesForLaterProcessing encountered (accumulator stack = 1006, handler stack = 335.                                                                                      ]   0.0%
Warning (lib.render): Call to Material::deferEntriesForLaterProcessing encountered (accumulator stack = 1006, handler stack = 335.
DEBUG (lib.render): Call to Material::deferEntriesForLaterProcessing encountered (accumulator stack = 1006, handler stack = 335.
Warning (lib.render): Call to Material::deferEntriesForLaterProcessing encountered (accumulator stack = 1006, handler stack = 335.
DEBUG (lib.render): Call to Material::deferEntriesForLaterProcessing encountered (accumulator stack = 1006, handler stack = 335.
Warning (lib.render): Call to Material::deferEntriesForLaterProcessing encountered (accumulator stack = 1006, handler stack = 335.
DEBUG (lib.render): Call to Material::deferEntriesForLaterProcessing encountered (accumulator stack = 1006, handler stack = 335.
Warning (lib.render): Call to Material::deferEntriesForLaterProcessing encountered (accumulator stack = 1006, handler stack = 335.
DEBUG (lib.render): Multiple calls to Material::deferEntriesForLaterProcessing encountered, no more will be reported this frame.
Warning (lib.render): Multiple calls to Material::deferEntriesForLaterProcessing encountered, no more will be reported this frame.
DEBUG (lib.render): Block size too small to satisfy allocation in arena allocator, 34656256 wanted (64 byte aligned), 33554432 block size.                                                                            ]   0.0%

Warning (lib.render): Block size too small to satisfy allocation in arena allocator, 34656256 wanted (64 byte aligned), 33554432 block size.

Error: Block size too small to satisfy allocation in arena allocator, 34656256 wanted (64 byte aligned), 33554432 block size.


SIGSEGV(segfault) callstack
librendering_mcrt_common.so(_ZN7moonray11mcrt_common19debugPrintCallstackEPKc+0x74) [0x7f9ac8f988e4]
 libapplication.so(_ZN7moonray17stackTraceHandlerEi+0xa4) [0x7f9ac9147a64]
  libc.so.6(+0x36400) [0x7f9ab717c400]
   librendering_geom.so(_ZN7moonray4geom22initIntersectionPhase1ERNS_7shading12IntersectionEPNS_11mcrt_common16ThreadLocalStateERKNS4_3RayEPKN10scene_rdl24rdl25LayerE+0xb0) [0x7f9abc8aa210]
    librendering_geom.so(_ZN7moonray4geom20initIntersectionFullERNS_7shading12IntersectionEPNS_11mcrt_common16ThreadLocalStateERKNS4_3RayEPKN10scene_rdl24rdl25LayerEiiibRKNSA_4math4Vec2IfEERKNSF_4Vec3IfEE+0x3d) [0x7f9abc8aad5d]
     librendering_pbr.so(_ZN7moonray3pbr18shadeBundleHandlerEPNS_11mcrt_common16ThreadLocalStateEjPNS_7shading14SortedRayStateEPv+0x415) [0x7f9abcef4d25]
      librendering_pbr.so(_ZN7moonray3pbr16rayBundleHandlerEPNS_11mcrt_common16ThreadLocalStateEjPNS0_15WrappedRayStateEPv+0x102b) [0x7f9abcef257b]
       librendering_pbr.so(_ZN7moonray3pbr18IncoherentRayQueue10addEntriesEPNS_11mcrt_common16ThreadLocalStateEjPPNS0_8RayStateEPN10scene_rdl25alloc5ArenaE+0x1d9) [0x7f9abced2909]
        librendering_pbr.so(CPP_addIncoherentRayQueueEntries+0xcc) [0x7f9abcf2544c]

Aborted

The messages up to Warning (lib.render): Multiple calls to Material::deferEntriesForLaterProcessing encountered, no more will be reported this frame. is shown very fast, and then the progress kept 0% , but DEBUG (lib.render): Block size too small to satisfy allocation in arena allocator, 34656256 wanted (64 byte aligned), 33554432 block size. is shown much much later, just before SIGSEGV.

Thanks for the report. You may be able to work around this issue by using the scalar integrator.

I believe the command line option is -exec_mode scalar, but I am not in front of a computer.

@kjeffery Thanks for the reply! Indeed, it doesn't crash in scalar mode:

moonray -in scene.rdla -in scene.rdlb -out /tmp/glass-of-water.exr -res 8.0 -debug -exec_mode scalar

However, the execution times seems to be exponential with respect to max_depth? Here is the total time for depth 8~12:

depth Total time
8 00:00:10.721
9 00:00:19.044
10 00:00:37.790
11 00:01:07.700
12 00:02:00.211

Then I continued experiments with max_adaptive_samples = 64 and the result is:

depth Total time
12 00:00:21.439
13 00:00:36.173
14 00:01:01.402
15 00:01:43.843
16 00:02:58.780

As a comparison, when rendered with pbrt v4 CPU mode, it takes 65 seconds with full resolution and spp=64. Here is the start of the pbrt file (converted to v4)

Integrator "bdpt"
    "integer maxdepth" [ 33 ]
Transform [ 0.999993 -0.0000592354 0.00373545 -0 7.27596e-12 0.999874 0.0158556 -0 0.00373592 0.0158555 -0.999867 -0 -0.0065528 -3.10084 25.6268 1  ]
Sampler "independent"
    "integer pixelsamples" [ 64 ]
PixelFilter "triangle"
    "float xradius" [ 1 ]
    "float yradius" [ 1 ]
Film "rgb"
    "string filename" [ "glass-of-water.png" ]
    "integer yresolution" [ 720 ]
    "integer xresolution" [ 1280 ]
Camera "perspective"
    "float fov" [ 20.114292 ]

If I changed it to 1/8 resolution (i.e. 160x90), it only takes 1 second.

I understand that the comparison doesn't make much sense and I noticed that with max_adaptive_samples = 64, the image is much cleaner than pbrt's spp=64, but I pbrt doesn't seem to have such a high growth of runtime when increasing maxdepth.

Is this the expected behavior? What should I do if I really need to set high maxdepth? In this case, I don't care about "correctness", just want it to look good and avoid very dark areas.

Increasing the depth to higher counts will put more pressure on the C++ stack, which holds information about the order handlers will get executed in, how many elements in each queue are left to process and so on. There is a stack overflow protection mechanism which kicks in at some point for safety where queued material evaluations are saved off to the side for deferred processing without increasing the handler stack depth any further (it seems this is where the bug is). One thing you could try is to grep the codebase for the line:

#define MAX_EXCL_ACCUM_STACK_SIZE 1024

and set it to a sufficiently high value (e.g. 4096) such that you don't see the deferEntriesForLaterProcessing errors any longer. As long as the stack doesn't overflow, your scene should render in vector mode.

From a performance point of view, since this scene contains a lot of specular bounces/transmissions, one thing to try would be to set the bsdf_sampler_strategy attribute to "one-sample" or "one-lobe", and the bsdf_samples attribute to 1. These attributes are mentioned here.

Thank you! bsdf_sampler_strategy is exactly what I'm looking for! Now that scalar mode can render even max_depth=64 with a small amount of time (without changing bsdf_samples to 1).
My final sceneVariables is:

SceneVariables {
    ["image_width"] = 1280,
    ["image_height"] = 720,
    ["lights_visible_in_camera"] = true,
    ["output_file"] = "",
    ["max_depth"] = 32,
    ["max_diffuse_depth"] = 2,
    ["max_glossy_depth"] = 8,
    ["max_mirror_depth"] = 32,
    ["max_adaptive_samples"] = 1024,
    ["min_adaptive_samples"] = 16,
    ["sampling_mode"] = 2,
    ["target_adaptive_error"] = 10,
    ["bsdf_sampler_strategy"] = "one-sample",
}

With this config, vectorized mode no long crashes even with depth=64 (and I didn't bother rebuilt it with MAX_EXCL_ACCUM_STACK_SIZE increased), but the result is darker (scalar mode's result matches pbrt's), so I guess there's still some bug. Anyway, I'm happy with this scene now. Thanks you both :)

A caution: changing the bsdf_sampler_strategy switches to different integrator code that is a bit experimental, so you may see look differences when changing this mode from the default.

Thanks @jmahovsky-dwa ! Actually I was wondering whether it's possible to switch bsdf_sampler_strategy based on current depth (e.g. only when the current depth is high, use "one-sample" strategy), by hacking some code, but forgot to ask. Based on your reply, it looks difficult or impossible :(

Most of our scenes don't have such problem but we can't know in advance, so I prefer some kind of "adaptive" behavior which uses the default integrator most of the time.

Yeah, the sampler strategy is currently an all-or-nothing thing. Yes, it does make sense to use fewer samples at greater depths but the integrator doesn't currently support that.

Understood. Thanks for the clarification!