WebGL Deferred Shading

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 6

Kangning Li
Tested on: Google Chrome 46.0.2490.80 on Ubuntu 14.04, i5-3320M @ 2.60GHz 8GB, Intel HD 4000 (Personal laptop)

Live Online

Demo Video

This repository contains a WebGL deferred shader with the following features:

deferred shading using WEBGL_draw_buffers
a toggleable "compressed g-buffer" pipeline
toggleable scissor testing for both the compressed and uncompressed pipelines
a tile-based lighting pipeline
toon shading
bloom as a post-processing effect

Running the demo above requires support for OES_texture_float, OES_texture_float_linear, WEBGL_depth_texture, and WEBGL_draw_buffers. You can check your support on WebGL Report.

Deferred Shading Overview

The standard deferred shader in this project renders data (position, normals, sampled color, depth, etc.) about what is visible in the scene to a set of WebGL textures referred to as g-buffers. These can be viewed in the debugView settings in the demo. These textures are then passed to a lighting shader that only performs lighting calculations on what is visible in the scene. Finally, a post processing shader can add effects like bloom (implemented here) or efficient depth-of-field simulation and toon shading edge detection if g-buffers are also passed in.

G-buffer compression

The default pipeline uses 4 g-buffers of vec4s to pass scene information to the lighting shader, along with a buffer for depth:

position
normal provided by the scene geometry
texture mapped color
texture mapped normal

The "compressed" pipeline instead uses 2 g-buffers along with depth:

texture mapped color
"compressed" 2-component normal vector (computed from texture mapped and geometry)

This compression and decompression of the normal depends on the normal being unit length, which lets the lighting shader compute the magnitude of the normals zcomponent from itsxandycomponents. The cardinality of thezcomponent is sent as part of theycomponent by padding. If thezcomponent is negative , theycomponent is "padded" with a constant so that its magnitude is greater than 1. The lighting shader then only needs to assess they components magnitude to determine the z components cardinality and correctly rebuild the y` component.

The lighting shader also reconstructs the world position of a pixel from its depth and screen coordinates with the current view`s camera matrix. More details on the technique can be found here and here.

Using "compressed" g-buffers is essentially a tradeoff between memory access and computation, which is usually ideal for GPU applications as GPUs are better at compute than memory access. Even in this imperfect case, in which the 2-component normals are still stored in a vec4 texture, reducing the number of g-buffers still leads to a noticeable improvement in performance. This performance improvement is apparent even as the number of lights increases, as the both pipelines run the lighting shader once per light.

Scissor test

Both the "compressed" and "uncompressed" g-buffer pipelines can optionally restrict the screen-space render area of each light using a scissor test, which in most cases speeds up the lighting shader computation for each light. A screen-space bounding box is computed for each light on the CPU, which then restricts the area that the GPU can draw over. This scissor test also allows us to skip lighting for lights that couldn't possibly be visible in the viewport, which is likely a large part of the performance boost.

However, the scissor test is only really useful for the "general" case, where a light's influence covers a relatively small area of the screen. In the case that a light is very close to the camera, the scissor test becomes less beneficial as the light pass for that particular light will essentially span the entire screen.

Tile based lighting

The lighting pipelines mentioned above all compute each light's influence independently and blend the result, essentially placing one draw-call per light, which leads to repeated access to the g-buffers and the framebuffer being drawn to.

One alternative to this is to inform the shader of the lights that influence an area of the scene and have the shader read the position, normal and color data once and then iterate over the given lights, then write the final accumulated result out to the framebuffer. This technique requires splitting the screen space into tiles and computing a datastructure that the shader can access to check the lights that influence a particular tile.

This implementation of a tiled pipeline stores this datastructure in another gbuffer and stores lists of light positions and colors, which limits the number of lights that can influence each tile based on the tile's resolution. Each screen space tile of this datastructure gbuffer is split, with one half holding a list of light colors as a vec4 and the other half holding the lights' positions and radii. The end of the list is demarkated with a colorless light with a negative influence radius and extreme z position, thus showing as a blue pixel in the debug view. Thus, the limit on the number of lights is TILE_SIZE * TILE_SIZE / 2, which for a 32 x 32 tile is still 512 lights.

This datastructure is computed on the CPU for each light in order, which becomes a performance bottleneck with more lights. Thus, this implementation includes options in the code to set a lower limit on the number of lights that can influence a tile. However, this can lead to undesireable artifacting, as lights that should prominently influence a tile may be omitted when the tile's light list is being computed. This implementation attempts to slightly alleviate this problem with an option to sort the lights based on their z-depth from the camera before computing the light list, thus reducing artifacting in some brightly lit planes that are closer to the camera and roughly parallel with the image plane.

without tiling, with tiling, and depth sorted

However, this does not completely resolve the problem for scenes that have multiple parallel planes at different depths.

without tiling, with tiling, and depth sorted

(all data measured using 32x32 tiles)

Tiling leads to noticeable performance improvements in scenes with large numbers of lights. However, again, the computation of the tile datastructure on the CPU is a major performance bottleneck with scenes that have fewer lights but many tiles.

Bloom

Bloom works by sampling the area around each pixel of the lighting pass's output for pixels with a luminance greater than 1, indicating an area that is "brightly lit." Because it is implemented as a post-processing step, it works across all lighting pipelines and adds a constant performance hit over the number of lights in the scene.

Toon Shading

Toon shading is accomplished in a lighting pass as a variant on the standard, 4-gbuffer, nontiled pipeline. Similar to bloom, toon shading works by sampling at each screen coordinate the surrounding coordinates. However, it does this with the normal g-buffer and depth buffer, allowing depth-based and normal based edge detection. It also computes a ramp for the blinn-phong lighting computation.

The cost of this additional sampling is multiplied by the fact that the toon shader must be called per light, making toon shading and especially edge detection in the lighting pass very expensive in terms of render time. The above scene, which contains 20 lights, could render in about 80 ms normally (no scissor test) but would take about 185 ms with toon shading. Rendering with the scissor test provides an enormous improvement by eliminating unnecessary lights. However, the performance difference is still about 40 ms versus 55 ms. This difference from the scissor test, however, further illustrates that this implementation of toon shading gets even more expensive as more lights are added.

This problem could be sidestepped by passing the normal and depth buffers to the post-processing shader for edge detection and contor drawing, thus making the performance hit constant as with bloom.

likangning93 / Project6-WebGL-Deferred-Shading