PBR / Clustered Forward Rendering

Question

PBR / Clustered Forward Rendering

aclysma opened this issue 4 years ago · comments

This is a Focus Area tracking issue

PBR is a standard-ish way of rendering realistically in 3D. There is both a lot of interest and a lot of brain-power in this area, so it makes sense to build PBR now. This focus area has the following (minimum) requirements:

PBR Shaders (which implies hdr)
Bloom (to convey hdr)
Shadowing (forces us to build out a real "pipeline")
Battle-test the current mid-level rendering abstractions and rework them where necessary

Active Crates / Repos

@StarArawn's draft PR with an implementation based on Filament: #261

Sub Issues

No active issues discussing subtopics for this focus area. If you would like to discuss a particular topic, look for a pre-existing issue in this repo. If you can't find one, feel free to make one! Link to it in this issue and I'll add it to the index.

Original Post (sorry @aclysma for stomping on this)

There was a discord conversation that I think is worth capturing. I'll do my best but I may miss some people or get some sentiments wrong. I also don't know everyone's github names.

StarToaster, fusha, and aclysma (me) all commented that clustered forward rendering was a good overall model to pursue. (Possibly also matthewfcarlson as well, not sure if he was agreeing or just linking a helpful doc :D)

StarToaster: I kinda hope we don't add a deferred renderer. We'll still need some sort of gbuffer for ssao, but clustered shading is more accurate and faster when compared with deffered shading.
matthewfcarlson: the Filament doc linked to the PBR milestone there was a good explanation of CFR along the frustum
Fusha: also, throwing my hat into the mix in support of clustered forward rendering
aclysma: forward is intuitive to work with and lots of techniques "just work", and clustering mitigates the issue with forward having practical limits to light sources

@cart suggested later:

cart: We'll need to start building a plan, but in the short term if you're interested, start familiarizing yourself with the current state of the bevy renderer. And maybe check out the Google Filament document. It's a nice overview of pbr implementation.

To summarize the main advantages of this approach are:

Most techniques work simply and intuitively. (For example you can use plain MSAA)
Good choice for VR and mobile
Can start by simply implementing/extending with forward rendering, and transition to clustered later without too much waste
A good balance of simple and good performance with many light sources

alimaragan commented 3 years ago

A

Gab Campbell · Answer 1 · Fri Aug 14 2020 10:25:17 GMT+0800 (China Standard Time)

Hey thats a great discussion to have. I suggest changing the title to something more actionable so that we can instinctively have an idea of the "Lifetime" of this issue. an example would be "Make a plan regarding X" or "Investigate the possiblity of Y". Its just a small guideline that helps keeping track of where a discussion should go and end :)

Carter Anderson · Answer 2 · Fri Aug 14 2020 10:29:12 GMT+0800 (China Standard Time)

Or maybe even just Clustered Rendering. As @GabLotus said, in general I'd like to avoid issues without a clear done definition. I'm fine with issues being big, but they should have clear outcomes.

(i also fully admit that i've set a bad example with some of my past issues)

Philip Degarmo · Answer 3 · Fri Aug 14 2020 11:47:27 GMT+0800 (China Standard Time)

What I'd like to see happen with this task is to gather opinions and possibly find consensus on a high level default render pipeline structure for bevy to target in the very near term and a little longer term. (i.e. very near term = ~3 months, little longer term = ~12 months). Ideally we could come up with a very brief roadmap that provides value now and also provides a path to improve on it later.

I would recommend keeping focus on forward rendering today, and favor choices that can fold well into doing clustered forward rendering later.

Gray Olson · Answer 4 · Fri Aug 14 2020 12:46:29 GMT+0800 (China Standard Time)

I'm Fusha on Discord, basically just coming to throw my 👍 into the hat on clustered forward but also to comment and mention a couple other related things:

One of the biggest downsides of (clustered) forward in my view is that it's basically going to rely on a big "uber-shader" to do, say, 90% of the heavy lifting of shading. The main issue with this (at least the one that I think of first) is that this creates the potential for a lot of code duplication, especially if you want to be able to dynamically change "graphics settings" (which is something I would say we definitely want to eventually). It's probably a good idea to have some sort of plan on how to manage that.
Also, somewhat related, there's a few things mentioned but not explicitly talked about too much in Filament that are quite useful and relate to this... a couple off the top of my head being NPR tonemapping (such as false-color ramp based on luminance) for debugging, and then doing exposure adjustment on the original light radiance values, before all the lighting calculations happen, rather than doing exposure correction after lighting calculations: this allows the use of half-precision floats for all the shading calculations which is particularly valuable for low-end and mobile graphics processors.

John · Answer 5 · Fri Aug 14 2020 21:36:22 GMT+0800 (China Standard Time)

One of the biggest downsides of (clustered) forward in my view is that it's basically going to rely on a big "uber-shader" to do, say, 90% of the heavy lifting of shading. The main issue with this (at least the one that I think of first) is that this creates the potential for a lot of code duplication, especially if you want to be able to dynamically change "graphics settings" (which is something I would say we definitely want to eventually). It's probably a good idea to have some sort of plan on how to manage that.

Hmm, I don't think you necessarily need to rely on one big uber shader. Godot does this and quite frankly its a mess. Plus WSL wont support shader defines. Instead if we really want a more module adaptive shader we likely should build shaders from smaller pieces programmatically. The other option is to use shader includes as much as possible to dedup code and have different shaders as different files. Currently though the shader system in place does not support shader includes. I created an issue here about it: #185

Also, somewhat related, there's a few things mentioned but not explicitly talked about too much in Filament that are quite useful and relate to this... a couple off the top of my head being NPR tonemapping (such as false-color ramp based on luminance) for debugging, and then doing exposure adjustment on the original light radiance values, before all the lighting calculations happen, rather than doing exposure correction after lighting calculations: this allows the use of half-precision floats for all the shading calculations which is particularly valuable for low-end and mobile graphics processors.

I'm not familiar using NPR tonemapping in this way do you have any articles or papers on the subject? On mobile reducing render passes becomes much much more important. I'm not convinced we should limit the desktop by mobile(low-end) constraints though. Perhaps going down the path of splitting the renderer into two different graphs would make more sense. Internally we could build the graph based off of user settings and hardware limits.

Also this is an interesting read:
http://efficientshading.com/wp-content/uploads/s2015_mobile.pptx

John · Answer 6 · Fri Aug 14 2020 21:39:39 GMT+0800 (China Standard Time)

(i.e. very near term = ~3 months, little longer term = ~12 months)

And here I'm trying to add it in after the compute stuff is done. 😄 I don't mind waiting but adding clustered shading doesn't change a lot of stuff.

Philip Degarmo · Answer 7 · Sat Aug 15 2020 00:03:23 GMT+0800 (China Standard Time)

What I had in mind for the 3-ish months was to spend some time improving and polishing a simple forward renderer. PBR, bloom, HDR... In particular, get shadows up and running because it touches a lot of rendering systems (multiple views, multiple passes in a material, can be done in parallel with other rendering stages).

Clustered forward rendering sounds to me like the direction we want to be headed and it seems like there is consensus on that so far. Could prototyping/R&D for clustered forward happen in parallel with fleshing out the current pipeline? I think this would help limit risk of things stalling out.

John · Answer 8 · Sat Aug 15 2020 00:15:04 GMT+0800 (China Standard Time)

Could prototyping/R&D for clustered forward happen in parallel with fleshing out the current pipeline? I think this would help limit risk of things stalling out.

We likely could create a plugin for it(similar to bevy_pbr) was my thoughts. bevy_pbr_clustered?

Gray Olson · Answer 9 · Sat Aug 15 2020 01:38:24 GMT+0800 (China Standard Time)

@StarArawn

I'm not familiar using NPR tonemapping in this way do you have any articles or papers on the subject? On mobile reducing render passes becomes much much more important. I'm not convinced we should limit the desktop by mobile(low-end) constraints though. Perhaps going down the path of splitting the renderer into two different graphs would make more sense. Internally we could build the graph based off of user settings and hardware limits.

Check out this part of the Filament docs https://google.github.io/filament/Filament.html#imagingpipeline/validation/scenereferredvisualization

Also, tonemapping (at least, global tonemapping, i.e. each fragment only has information about itself, which is the case for most tonemapping operators used in games) can simply be done as the last step of the main uber-shader; it doesn't necessarily need to be a separate render pass.

Carter Anderson · Answer 10 · Sat Aug 15 2020 06:03:06 GMT+0800 (China Standard Time)

Lots of great conversation happening here. I really appreciate the thoughtfulness and expertise you all are bringing to the table.

I will defer to you all here when it comes to clustered. It seems like a good place to start. Ideally we experiment with multiple paradigms and build things in a way that makes them reusable across paradigms. The Armory project is a pretty good example of supporting forward and deferred with the same pieces. However that isn't a hard requirement. We can always try to modularize later if we need to.

Uber shaders don't scare me as an output, but they do definitely scare me from an organizational standpoint. Using imports to create scoped (and ideally reusable) pieces of shader logic seems like the right call, with or without uber shaders.

Uber shaders can perform quite well in some contexts (ex: the dolphin emulator project had great success with their "uber shader" effort).

The Filament docs really are great. We should probably start a collection of rendering resources that we can all learn from. When you start implementing, please record in the repo what sources you used (both for giving appropriate credit and to help new contributors).

I'm starting to consolidate my thoughts on what the Bevy development process should look like. I think the main bevy crates (and bevy repo) should be for building our current best ideas for "final" implementations of core functionality. Ex: eventually bevy_pbr will contain the default pbr plugin that everyone uses. Pushing code there will be a signal that we have made a decision to take that crate in a given direction.

But spending time discussing and worrying about building the 100% correct solution will stall us and force us to get caught up in theoreticals. Almost without exception I think we should be building "fast and loose" prototype code outside of the main bevy repo, probably with some naming convention like bevy_exp_pbr_clustered, bevy_pbr_clustered_prototype, etc. Descriptiveness (when there can and should be multiple competing implementations) is ideal.

In the short term, I encourage you all to create and distribute your own crates for experimentation (while being respectful of the core bevy_XXX namespace). As specific implementations gain momentum and stability, we can then start discussing centralization of efforts.

I'll try to give appropriate visibility into the various distributed projects to help direct people's attention and avoid duplicate work.

I'll also be setting up working groups for specific focus areas (and PBR will be one of them).

John · Answer 11 · Sat Aug 15 2020 07:41:42 GMT+0800 (China Standard Time)

Almost without exception I think we should be building "fast and loose" prototype code outside of the main bevy repo, probably with some naming convention like bevy_exp_pbr_clustered, bevy_pbr_clustered_prototype, etc.

This I guess raises the next question, how do we separate out the shaders into different crates? Ideally we should have a way of sharing the PBR implementation between a bevy_clustered and a bevy_forward plugin. Calculating the lighting works exactly the same between forward and forward clustered. I'm not sure I want to replicate PBR work inside a bevy_pbr_clustered_prototype plugin. If possible the PBR plugin should likely create an include that can be shared with other plugins. However we currently don't have a good mechanism for including shaders..

Carter Anderson · Answer 12 · Sat Aug 15 2020 07:52:10 GMT+0800 (China Standard Time)

Yeah I agree that long term breaking them up into separate crates could be beneficial (or alternatively, just separate modules in the same crate). Short term I expect making crate divisions will hamper productivity. But if that workflow works for the implementors, im cool with it.

As you saw, right now shader includes dont work. I don't see a huge problem with building "big shaders" first and then breaking them up later. But its very possible we can make includes work with a small amount of effort. I just dont want to waste too much effort on that when naga is so close.

Gray Olson · Answer 13 · Sat Aug 15 2020 10:52:09 GMT+0800 (China Standard Time)

Uber shaders don't scare me as an output, but they do definitely scare me from an organizational standpoint. Using imports to create scoped (and ideally reusable) pieces of shader logic seems like the right call, with or without uber shaders.

Uber shaders can perform quite well in some contexts (ex: the dolphin emulator project had great success with their "uber shader" effort).

Yeah this is my thoughts as well and is what I wanted to convey originally.

Philip Degarmo · Answer 14 · Sat Aug 15 2020 14:06:51 GMT+0800 (China Standard Time)

We had a short discussion in discord tonight RE: next short-term steps. I'll summarize here for further discussion:

Now:

PBR (which implies HDR)
Bloom (to convey HDR)
Shadowing (forces us to build out a real "pipeline")
Papercuts with current implementation

Later:

Skeletal Animation (makes 3d actually "usable")
Visibility (many other features like LOD-determination might use information here)
Pipeline render thread (https://github.com/aclysma/renderer_prototype/blob/master/docs/pipelining.png)

(I don't think this needs to block someone doing R&D for forward clustered rendering as a longer-term project on the side.)

John · Answer 15 · Thu Aug 20 2020 21:06:27 GMT+0800 (China Standard Time)

I opened a draft PR #261 which is a somewhat working implementation of googles filament. I think its a good starting point for getting a feel of how we want PBR to work in bevy.

John · Answer 16 · Thu Aug 20 2020 21:52:44 GMT+0800 (China Standard Time)

Thinking about bind groups and how limited we are on them I came up with the following(WIP) non-exhaustive list:

Textures:

// Standard PBR textures
albedo: Texture -  set 3 binding 1
albedo_sampler: Sampler - set 3 binding 2
normal_map: Texture -  set 3 binding 3
normal_map_sampler: Texture -  set 3 binding 4
combined_roughness_metallic: Texture -  set 3 binding 5
combined_roughness_metallic_sampler: Sampler - set 3 binding 6
ambient_occlusion: Texture -  set 3 binding 7
ambient_occlusion_sampler: Sampler -  set 3 binding 8
emissive: Texture -  set 3 binding 7
emissive_sampler: Sampler -  set 3 binding 8

Lighting:

// Lights
light_buffer: Buffer - set 1 binding 0

// Cluster Info
light_cull_list: Buffer - set 1 binding 1
frustums: Buffer - set 1 binding 2 // Camera frustum separated out into smaller frustums

// Shadows
// Represent directional/point/spot shadow maps. Point light shadow maps would be as a cube map in 2D space.
// Optionally we can drop this down to a single shadow map atlas texture, however I'm not sure if that's any better.
// From my testing you can more smartly allocate/de-allocate memory using arrays, however it does eat up an 
// additional 3 slots for textures.
// The resolution and the number of allocations per resolution could be a user setting or we can hide it behind
//  a single setting called ShadowQuality. 
shadow_map_1: Texture2DArray - set 1 binding 3  // highest resolution shadow maps ex: 1024x1024 x 8
shadow_map_2: Texture2DArray - set 1 binding 4 // medium resolution shadow maps ex: 512x512 x 16
shadow_map_3: Texture2DArray - set 1 binding 5 // low resolution shadow maps ex: 256x256 x 32
shadow_map_4: Texture2DArray - set 1 binding 6 // lowest resolution shadow maps ex: 128x128 x 64
shadow_sampler: Sampler - set 1 binding 7 // Note: We might want more samplers depending on the light type.

// IBL
// Similar to the storage/allocation strategy for shadow maps.
// Internally to the texture cube array we would have groups of two probes(diffuse irradiance, specular).
probe_map_1: TextureCubeArray - set 1 binding 8 // highest resolution probe maps
probe_map_2: TextureCubeArray - set 1 binding 9 // medium resolution probe maps
probe_map_3: TextureCubeArray - set 1 binding 10 // low resolution probe maps
probe_map_4: TextureCubeArray - set 1 binding 11 // lowest resolution probe maps
probe_sampler: Sampler - set 1 binding 12

This brings us to 13 textures which gives us a little wiggle room for more maxSampledTexturesPerShaderStage: 16

We can request for more than 16, but remember that it will no longer align to webgpu min spec and wont run on all hardware and may not run on web at all.

Zicklag · Answer 17 · Sun Oct 25 2020 00:41:51 GMT+0800 (China Standard Time)

It might not be ready for use for a while, but it would be awesome if we could leverage Embark's recently announced Rust GPU project:

https://github.com/EmbarkStudios/rust-gpu

Alice Cecile · Answer 18 · Sat Mar 20 2021 12:41:34 GMT+0800 (China Standard Time)

#1554 has merged, adding a lot of basic PBR functionality!

Dimev · Answer 19 · Fri May 28 2021 16:17:10 GMT+0800 (China Standard Time)

This is a bit far ahead of what's currently implemented, but some form of Global illumination would be nice to have (Something like voxel cone tracing, or GI (and soft shadows) via distance fields)

Robert Swain · Answer 20 · Sat Jun 05 2021 23:35:22 GMT+0800 (China Standard Time)

I’m not sure if it’s a good idea to extend this issue or if we want to create a new one but I think it would be good to look into the options for higher-level rendering features and figure out some kind of plan of what we’d like to implement. I think this would help provide focus for render feature implementation and also more coverage for consideration for the renderer rework effort that is happening at the time of writing. For readers, I'm just a random interested party in the community. :)

Main Target and High Fidelity Target

Discussions have been held on Discord in the rendering channel that lean toward having a main render path that 'works anywhere' - no clear guidelines were decided upon but @cart suggested perhaps 'works on 10-year-old desktop hardware' was not unreasonable. This is around the NVIDIA GeForce 500/600 series, and AMD Radeon HD 6000/7000 series (HD 7000 was the start of GCN architecture).

Some others, me included, would like to use more advanced features of modern hardware and APIs to achieve graphical results with higher fidelity. I think @StarArawn noted the 'two target' approach first, similar to Unity's Universal Render Pipeline and High-Definition Render Pipeline. It was suggested by people who know better that this would likely result in different architectures as we see the high fidelity state of the art renderers tending to use a deferred architecture, using more compute shaders, hardware accelerated ray tracing, mesh shaders, etc.

This seems like it would lead to a plan to focus on building the main 'works anywhere' target first, though of course the community can build in whatever order it wants.

NOTE: This is just a brain dump of some things I've been looking at recently. There are a lot more pieces to consider. And we'd probably want to dig a bit deeper to consider pros and cons, what can fit into the main

Lighting

Ambient occlusion

Purpose

Ambient occlusion addresses medium to long range occlusion of ambient light (which is light that has undergone many bounces and is in some sense 'omnidirectional'). Note that it should be used in conjunction with the occlusion texture in PBR models as that occlusion texture provides more fine-grained, short-range occlusion information.

Options

SSAO (Screen Space Ambient Occlusion)
- Involves sampling a number of points within a hemisphere around the surface normal to estimate the proportion of the directions from which light can reach the surface.
- Screen Space Directional Occlusion seems to be an improvement on this but is computationally expensive. Deferred Screen Space Directional Occlusion has most of the quality benefits, and is much cheaper. https://kayru.org/articles/dssdo/
GTAO (Ground-Truth Ambient Occlusion)
- State of the art horizon-based method. Quite a bit more complicated to implement but superior results. Should be a drop-in replacement for SSAO as it has the same data input/output requirements.

Status

I have a mostly-working SSAO implementation working but it relies on multiple branches to aid doing fullscreen passes (which is needed for bloom anyway), global resource bindings (the render resource nodes are going away / going to be reworked), and Draw/RenderPipelines taking a type parameter such that multiple such components can be added to Entities in order to be consumed in multiple separate render passes (I think this also goes away, or perhaps needs modifying, after the renderer rework).

Global Illumination

Global illumination is about accounting for the indirect contributions from 'directional' (i.e. non-ambient) light sources such as directional / point / area / etc lights.

Options

UE5’s Lumen seems to be a combination of some form of Signed Distance Field GI and Screen Space Ray Tracing using hardware ray tracing where available and software ray tracing otherwise. https://docs.unrealengine.com/5.0/en-US/RenderingFeatures/Lumen/TechOverview/
UE4 has a combination of precomputed global illumination and dynamic Screen Space Global Illumination. https://docs.unrealengine.com/4.26/en-US/RenderingAndGraphics/GlobalIllumination/
NVIDIA has a voxel cone tracing method called VXGI (Voxel Global Illumination) https://developer.nvidia.com/vxgi
Majercik et al (including Morgan McGuire, all NVIDIA) proposed DDGI (Dynamic Diffuse Global Illumination) which uses ray tracing with a per-frame ray budget that accumulates over time and is scalable through this approach. https://morgan3d.github.io/articles/2019-04-01-ddgi/ This too has been suggested can be combined with Screen Space Ray Tracing as an alternative approach to Lumen. Morgan McGuire pointed to an updated and extended version of the paper that handles large open worlds too: http://jcgt.org/published/0010/02/01/
Godot 4.0 has an SDFGI (Signed Distance Field Global Illumination) implementation that was funded by an Epic MegaGrant. https://godotengine.org/article/godot-40-gets-sdf-based-real-time-global-illumination https://www.youtube.com/watch?v=ztkBRFocHww https://www.youtube.com/watch?v=QFKPrDv-Ue8 . Juan Linietsky tweeted a comparison thread of SDFGI and Lumen: https://twitter.com/reduzio/status/1401589065905061892?s=21

Dimev · Answer 21 · Sat Jun 05 2021 23:49:15 GMT+0800 (China Standard Time)

To add to GI methods:

ue4 had light propagation volumes, and another form of GI that works with distance fields
Godot has Voxel cone tracing as well, which according to the rendering dev is more efficient that the NVIDIA impl
Sonic Ether (from the minecraft shaders) also made a GI method a while ago, but iirc it requires a deferred renderer
if possible, light mapping is a nice to have option, as it's fast for static scenes

Ambient occlusion:

IQ (https://www.iquilezles.org/) has a method for AO that combines shadowmaps + screen space for more low res detail
providing some option to pre-bake AO would be nice to have

Robert Swain · Answer 22 · Sun Jun 06 2021 00:43:58 GMT+0800 (China Standard Time)

It’s been suggested on discord that this should probably be a separate discussion. GI and AO could go into one perhaps. But I’d also like somewhere to discuss an overview and maintain the current state of renderer discussions.

Other ideas that come up are dynamic sky / atmosphere, volumetric lighting and fog, crepuscular rays, depth of field, chromatic aberration, and other physical camera things, colour grading, clouds, weather. A number of those would be better implemented as plugins and considering how to do that would impact the APIs of the core renderer.

The point isn’t to plan out exactly what and how to do everything up front but rather give some idea of what things are needed / desirable, what methods are good, what rough order should they be done in with respect to the visual benefits they bring, what can be done on top of bevy_render / bevy_pbr straight away and what needs more work to be done there first, etc.

@cart - where do you think such an overview should be discussed and a summary be maintained?

Carter Anderson · Answer 23 · Tue Jun 08 2021 04:45:47 GMT+0800 (China Standard Time)

where do you think such an overview should be discussed and a summary be maintained?

This should probably start as a github discussion. Most of the features discussed are orthogonal to each other, so I don't see much value in having an official "global list and implementation order" for all render features. But it would be good to identify which features intersect with each other (and how). And collecting implementation details / algorithm options / requirements for each feature seems valuable. This might help inform renderer api decisions. Feel free to create a github discussion if you want to facilitate this "information collection and consolidation" effort.

But when it comes time to start making "official plans", each feature should have its own RFC so we can discuss it in detail.

Robert Swain · Answer 24 · Tue Oct 12 2021 23:29:31 GMT+0800 (China Standard Time)

As a note, @ChangeCaps implemented bloom here: #2876 and I have a branch that implements clustered-forward rendering: https://github.com/superdump/bevy/tree/clustered-forward-rendering

Robert Swain · Answer 25 · Sun Nov 21 2021 03:39:25 GMT+0800 (China Standard Time)

PR for clustered forward rendering is up:

#3153

Alice Cecile · Answer 26 · Sun Jan 02 2022 03:04:32 GMT+0800 (China Standard Time)

Closing this out, since #3153 merged!