Potentially incorrect wording in the specification

Question

Potentially incorrect wording in the specification

bialpio opened this issue 6 months ago · comments

When going over the spec for issue #43, I have realized that we may have a mismatch between what the specification says, and what we do in our ARCore-backed implementation in Chrome.

Namely, the spec says that in the buffer that we return, "each entry corresponding to distance from the view's near plane to the users' environment".

ARCore's documentation seems to have a conflicting phrasing:

In ArFrame_acquireDepthImage(), we have "Each pixel contains the distance in millimeters to the camera plane".
In Developer Guide, we have "Given point A on the observed real-world geometry and a 2D point a representing the same point in the depth image, the value given by the Depth API at a is equal to the length of CA projected onto the principal axis".

If ARCore returns data according to 1), then I think it'd be acceptable to leave the spec text as-is, but then our implementation may not be correct (namely, I think we may run into the same issue that causes @cabanier to need to expose at the very least the near plane distance that ARCore internally uses?).

If ARCore returns data according to 2), then the values in the buffer we return are not going to depend on the near plane. In this case, we are not going to be compliant with the spec (we don't have a distance from near plane to user's environment), and the only way to be compliant will require us to adjust each entry in the buffer - this may be expensive given that this'll happen on CPU. IMO the best way to fix this would be to change the spec prose here, but I think this may be considered a breaking change, so we'll need to discuss how to move forward.

I'm going to try to confirm with ARCore what is actually their behavior, I'm not sure if this issue is actionable until that happens.

Rik Cabanier · Answer 1 · Wed Jan 03 2024 08:39:51 GMT+0800 (China Standard Time)

I was told that the OpenXR API returned near and far plane as well as fov to make sure that the code that calculates the scene will use the same matrices as the system code that calculates the depth texture.
The values in the buffer can be used directly by the shader. I think that means that this matches with point 2.

Are you sure that you need to do the adjustment in that case? Are you adding the near plane distance in your shaders?

Piotr Bialecki · Answer 2 · Thu Jan 04 2024 04:46:31 GMT+0800 (China Standard Time)

I was told that the OpenXR API returned near and far plane as well as fov to make sure that the code that calculates the scene will use the same matrices as the system code that calculates the depth texture.

Speaking of OpenXR, can you point me to the API or extension in OpenXR that you use for this?

The values in the buffer can be used directly by the shader. I think that means that this matches with point 2.

If the values in the buffer can be used directly by the shader for occlusion, then I'm 99% sure that they match point 1. It'd mean that they already went through some projection matrix (and yes, if you don't know near, far, & FOV of that matrix, then there's not much you can do with the data), which means that they are going to be normalized to range [0, 1] where 0 means that user's environment is at camera's near plane (or closer?), and 1 means that the user's environment is at camera's far plane (or further?) - i.e. this is equivalent to "entries are the distance from camera's near plane to the environment, in unspecified units".

If the data were returned according to point 2, then system's near & far is not needed - you have data in some physical units, in eye space ("distance from the camera to user's environment"), and you can use them in the shader for occlusion if you transform them by your own projection matrix first (pick near & far in whatever way works for you, just make sure FOV matches).

Are you sure that you need to do the adjustment in that case? Are you adding the near plane distance in your shaders?

I'm quite certain that if it's "distance from the camera to the environment" (option 2) and the spec says it should be "distance from view's near plane to the environment" & we decide not to change the spec, then an adjustment over the entire buffer is going to be needed, simply because those 2 things aren't the same.

Rik Cabanier · Answer 3 · Thu Jan 04 2024 05:07:05 GMT+0800 (China Standard Time)

I was told that the OpenXR API returned near and far plane as well as fov to make sure that the code that calculates the scene will use the same matrices as the system code that calculates the depth texture.

Speaking of OpenXR, can you point me to the API or extension in OpenXR that you use for this?

I didn't find it on the Khronos site but it is listed on ours: https://developer.oculus.com/documentation/native/android/mobile-depth/

Piotr Bialecki · Answer 4 · Wed Jan 10 2024 03:01:30 GMT+0800 (China Standard Time)

I have confirmed that ARCore returns the depth data according to pt.2.

Which means that we need to decide how we want to make progress here. It seems that we have 2 systems returning data in 2 different ways, and we'd like to not mandate anything that'd incur large costs on the implementers (e.g. performance impact of mandating adjusting the data in some manner). We also need to thread the needle carefully if we do not want to make a breaking change.

At a minimum, I think the description of the data contained by XRCPUDepthInformation should be changed to match the reality of what is currently returned (we'd also need an additional subsection for interpreting the results) . This'd mean that XRWebGLDepthInformation will return different data compared to it - it is a potential trap for the users I think. I could maybe explain it away by saying that if you care about depth on CPU, it probably means you want it for physics, and if you care about it on GPU, it is probably for occlusion - this way, a difference in the data becomes more acceptable, and we may not need further changes to the spec (except we'd need to solve #43).

@cabanier, @toji - do you have any early thoughts here?

Brandon Jones · Answer 5 · Wed Jan 10 2024 06:02:27 GMT+0800 (China Standard Time)

Hm... this is tricky. I certainly don't want to break anyone, but I also question how many apps are already making use of these values. I think it's likely to increase with the Quest adding this functionality, but it's being exposed there in a fairly different manner, so I think we have the opportunity to make some changes to how the data is interpreted now, as anyone who's interested in expanding their existing app's compatibility will have to update their usage regardless.

I'm also reluctant to enforce data transformation to a specific space. As @bialpio points out there's probably different spaces that make sense for different use cases, and if we're pushing for the system to normalize we could end up just forcing devs to undo a spec-mandated transformation because we chose the "wrong" space for their use case.

In other words, if there's going to be transformations anyway, lets leave them in the hands of the person who knows best what's needed: the developer.

I feel like adding the data from #43 is the ultimate solution, because then there's no ambiguity about what the range is and different systems can conform to any requirements imposed on them by their hardware/platform. If you measure from the camera? depthNear = 0. Measure from the projection near plane? depthNear = nearPlane. Having a far plane in place as well will be helpful for devs in terms of doing the math to shift the values as needed. (Maybe we want to say that the far plane can't be infinity? Not sure if that would make the math harder for devs of not.)

Rik Cabanier · Answer 6 · Wed Jan 10 2024 13:21:01 GMT+0800 (China Standard Time)

I'm also reluctant to enforce data transformation to a specific space. As @bialpio points out there's probably different spaces that make sense for different use cases, and if we're pushing for the system to normalize we could end up just forcing devs to undo a spec-mandated transformation because we chose the "wrong" space for their use case.

I agree that we don't want to tie this to a space. Quest is returning depth far/near so the author can feed that back into WebXR, not to interpret the values of the depth buffer differently.

Having a far plane in place as well will be helpful for devs in terms of doing the math to shift the values as needed. (Maybe we want to say that the far plane can't be infinity? Not sure if that would make the math harder for devs of not.)

That won't work for Quest because afaik it will always report infinity for the far plane.

Brandon Jones · Answer 7 · Thu Jan 11 2024 05:49:30 GMT+0800 (China Standard Time)

Sorry, I didn't mean to suggest "space" in the the proper WebXR sense here. I meant "depth range".

That won't work for Quest because afaik it will always report infinity for the far plane.

Noted. So I guess we'd have to at least enable the possibility of a far plane at infinity, unless we're really confident that's what all implementations are going to do.

Piotr Bialecki · Answer 8 · Sat Mar 23 2024 03:41:59 GMT+0800 (China Standard Time)

/facetoface to chat about the best way to move this issue forward.

Ada Rose Cannon · Answer 9 · Sat Mar 23 2024 10:16:32 GMT+0800 (China Standard Time)

I think it might be too late to add it to the agenda at this point but we will take a look Monday morning to see how we can fit it in. Please remind me on Monday.

Piotr Bialecki · Answer 10 · Tue Mar 26 2024 06:02:48 GMT+0800 (China Standard Time)

Discussed during the F2F. Conclusions:

we need to expose the depthNear - this will be a breaking change (not in API shape itself, but in a way that it is used)
we don't need to take account non-linearities when exposing normalized data - if we ever get a system that exposes data that had a non-linear function applied on it, we'd need a V2 version of the API
we probably don't need depthFar even if data is normalized to [near, far] range because this can be handled by rawValueToMeters factor. @bialpio to show the receipts aka math equations to convince others (and himself)
exposing depthFar is easy, but may break mental model of the app developers (because depthFar can be +Inf).

idrisshah · Answer 11 · Tue May 14 2024 23:44:20 GMT+0800 (China Standard Time)

Threejs already seems to have it implemented with depthNear and depthFar used for depth. Implementation of Threejs suggest other diversions from the spec too. Like: https://immersive-web.github.io/webxr/#dictdef-xrsessioninit is not respected by the implementation . I am not sure, should the spec be updated with this change or threejs should be patched to conform with the implementaion?

Rik Cabanier · Answer 12 · Wed May 15 2024 00:43:21 GMT+0800 (China Standard Time)

exposing depthFar is easy, but may break mental model of the app developers (because depthFar can be +Inf).

Yes, we return depthFar as infinity and had to make changes to browser to accept that value.

Rik Cabanier · Answer 13 · Wed May 15 2024 00:44:46 GMT+0800 (China Standard Time)

Implementation of Threejs suggest other diversions from the spec too. Like: https://immersive-web.github.io/webxr/#dictdef-xrsessioninit is not respected by the implementation .

SessionInit is respected by all implementations. Does Quest browser diverge?

idrisshah · Answer 14 · Wed May 15 2024 04:49:23 GMT+0800 (China Standard Time)

SessionInit is respected by all implementations. Does Quest browser diverge?

I was not able to locate SessionInit being used for depth-sensing option in three.js implementation. Quest browser works fine with threejs depth sample but i was not able to find where SessionInit was passed in through threejs implementation fro Quest Browser.