immersive-web / performance-improvements

A feature-incubation repo for XR-related performance improvements. Feature lead: Trevor F. Smith

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Expose combined frustum from both eyes

fernandojsg opened this issue · comments

Currently the API is providing left/right projection and view matrices so the js developers don't need to compute them.
Following with these lines I thought that it could be nice if the API expose a commonly needed feature for each engine: the combined frustum to perform the culling for both eyes (as defined for example by Cass Everitt in https://www.facebook.com/photo.php?fbid=10154006919426632&set=a.46932936631.70217.703211631&type=1&theater)

This is a great idea! For reference, here's how we expose the same functionality for Windows Mixed Reality:TryGetCullingFrustum API . For what it's worth, we also expose the combined view frustum, but that's not conservative enough for culling because of reprojection. For now, just the culling frustum would be a great addition to the WebVR api

I think this is an excellent idea as well. When VRFieldOfView is removed (now deprecated), this would be an essential part of the API.

I'm fine with adding this, though I'm not entirely sure what the best format is to surface it with? Win Holographic returns 6 planes, which seems sensible. You could also express that as a projection matrix, though, which might be more concise?

we also expose the combined view frustum, but that's not conservative enough for culling because of reprojection

@NellWaliczek: I'm curious about this. Does Win Holographic also return enlarged projection matrices to give reprojection some wiggle room? Otherwise why would a more conservative culling frustum make a difference?

The perils of typing too quickly! ^_^ What I meant to say is that the combined view frustum is not conservative enough for culling because it doesn't account for likely updates in pose prediction. For example, when combined with the UpdateCurrentPrediction API, a developer could early cull with the conservative frustum and then update the pose before drawing.

I think six planes makes more sense than a projection matrix, both because it's self documenting and convenient. With six planes you don't run the risk of anyone accidentally trying to use it for rendering/projecting something (since that use case doesn't really make much sense).

@NellWaliczek That's a very good point. I'd suggest that this WebVR call be named similarly, explicitly defining the intent of the usage. Rather than "combined frustum", should be something like "cullingFrustum". The calculation of the planes should be defined by the VR implementation and simply guaranteed to be large enough to include the volume possibly presented in the next frame.

Many game engines will asynchronously perform culling (ie. could be done in a WebWorker). Would there be benefit to having a duration or frame count to be passed into a getter function, that can inform the implementation to return a more conservative frustum that takes into consideration the latency of the culling job?

We would alternately leave this to content if there is no additional information available to the VR APIs. A typical content-side solution may be to cull against a swept volume generated from the instantaneous culling planes and the pose change velocity.

I'd be reluctant to not expose this consistently for all platforms. This isn't something like room scale bounds where it's just not applicable to certain hardware. If the underlying API can provide a good value that's great, but if not we should just do the damn math for the page. Codify the default equation for computing it into the spec and state that's what we'll return if the platform doesn't provide a better answer. Otherwise we're expecting the pages to do some non-trivial matrix voodoo to get out a reasonable value, which they probably won't do correctly or consistently.

So my next question is: Does anyone know the math for this? 🙄 I've been poking around today at trying to get something implemented for @mrdoob's WIP WebVRCamera (https://github.com/mrdoob/three.js/blob/dev/examples/js/vr/WebVRCamera.js#L113) and, well, this type of matrix-foo is just not my strong point. (Says the guy who wrote one of the more popular JS matrix libraries...) These diagrams have been helpful, but still don't feel like I've nailed it yet.

@toji Agree. It should always be available, even if there is no lower-level function in the underlying API.

One foolproof (ie naive but simple) way to do this would be to transform points at the 8 corners of the frustum using the inverse of the projection matrices. Do this for both the left and right eyes.. Then, find the maximum fov on each axis from an origin point centered between the eyes. This should, in theory, work for any arbitrary matrix passed in, even with off-axis projections. It should generate a conservative volume suitable for culling.

@toji How do you feel about doing the more complex swept-volume? It would need to take into account the angular momentum and positional velocity and extrude the volume further. Passing in a value of 0 would effectively disable it.

I would be fine writing this as an example for advanced users, but perhaps it would be easier for devs to get this directly from the API..

Simpler game engines are likely to just extrude all planes by a non-adaptive amount. (ie. just make it 25% bigger)
More complex game engines may want to do their own optimizations beyond the swept volume.
Should we just stick with the non-predictive culling volume calculation if the underlying API doesn't provide the swept volume? Or should we implement a fall-back that also does the predictive swept volume?

If we decide to implement an always-available predictive swept volume, I'd be glad to write a reference implementation.

This is actually a pretty fun little problem, since we really can't make any device specific assumptions about the frusta!

I couldn't find any obvious "best practice" algorithm for this. I quite like @kearwood's idea, but am concerned that the combined origin really ought to be pushed back a bit so as to not give overly pessimistic "side planes", but not sure how to know at what depth it needs to be. If there's a simple heuristic for choosing the somewhat arbitrary "combined origin" up front (especially its depth), I think it would be hard to beat the simplicity of that approach.

Another idea that is more complicated but doesn't rely on choosing an origin up front (and can maybe give slightly tighter frusta), is to flatten all the "near" corner points to the closest Z value, and compute a 2D OBB for them (using favorite 2D OBB generation algorithm, like PCA). Then for each edge in this 2D OBB, build a plane using those two near corner points and the first "far" corner point, loop through the rest of the far corner points and if a corner is outside the current plane, replace the far corner in your triplet of points and recompute the plane.

I'm in favor of ssylvan's proposed algorithm as a reference implementation and would probably implement the Firefox algorithm this way for regular stereoscopic headsets.

In the case of magic windows, this should be a compatible concept.

In the case of a C.A.V.E. or other system with greather than 180 degree FOV, the 6 planes could simply be placed far enough from the origin to avoid culling anything at maximum z-depth.

The spec could say something to the effect of "If a point is on the outside of any of the 6 planes, the point is guaranteed not to be visible on the display within the near and far depth and can be culled."

@kearwood Are you still interested in writing a reference implementation?
three.js has an implementation to get the combined frustrum from the projection matrices. However, because they make incorrect assumptions, this algorithm returns incorrect results.

This feels like a pretty important thing to have. Frustum culling is a pretty basic requirement for any sort of production renderer.

I can second this, having some mechanisms that make frustum culling easier would be very helpful.

How is this problem handled in OpenXR?