immersive-web / webxr-ar-module

Repository for the WebXR Augmented Reality Module

Home Page:https://immersive-web.github.io/webxr-ar-module

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Should there be a single "immersive" XRSessionMode?

ddorwin opened this issue · comments

The core spec includes an "immersive-vr" mode, and the current plan in this module is to add an "immersive-ar" mode in this module. However, as discussed in immersive-web/webxr#786, this bifurcation of AR and VR into distinct and separately-requested sessions creates issues for progressive enhancement/fallback (#27) and may not result in the best configuration for developers or users. This issue tracks exploring and considering whether two distinct immersive sessions is the best approach. It may be impacted by decisions in #27.

immersive-web/webxr#786 (comment) explores this in more detail, including raising the question of whether there should be a single "immersive" XRSessionMode. Such an approach would likely be accompanied by other changes that would allow developers to provide information such that the user agent could configure an appropriate session. I believe such API design and affordances would address many of the issues discussed in subsequent comments. I hope that these will be considered as part of the exploration in this issue.

Reiterating what I said in the comment above, such a change does not require immediately removing "immersive-vr". Thus, this exploration can focus on the best solution going forward without concern about breaking backwards compatibility.

I would be in favor of pursing this. I think things would be much cleaner if "immersive" was the mode, and "mixing with the world around you" was a feature (that you could request, and/or test for, if you want).

I know it's yelling into the void, but as many of you know, I've long advocated for a reactive version of WebXR, and this fits well. My preference is still that the UA, not the page, triggers the transition into immersive mode, signalling the page with an event (requesting immersive mode, and informing the page of the mode). The page would specify it's preferences (e.g., via some combo of meta tags and calls to something akin to "requestSession"), but only the UA can trigger immersive mode. Probably via some visual UI, but perhaps via other means (e.g., navigation).

I'm not in favor of pursuing this because VR, immersive-AR and handheld-ar experiences need to be handled differently by the author.
A scene that is completely appropriate for a cell phone screen, is more than likely too big for a headset and will look disorienting in VR. If handheld-ar ships on cell phones, the vast majority of sites will only be catered for that form factor. We don't want to provide experiences that we know will be bad since it will turn people away from WebXR on headsets.

If an author wants to make their experience work on different devices, they can test for different session modes and render the scene correctly for the detected device.

Wouldn't this issue belong in the Core spec repo?

I agree that this should be moved to the core spec

(fwiw I am not in favor of pursuing this)

I filed this - and #27 & #29 - here so that they can be fully considered in the context of AR discussions and not block the core spec. (See also the last paragraph of my comment above.) If this one is moved, #27 should be moved as well.

I made a promise to the other editors at TPAC that I'd drop a history lesson in here. Forgive me for taking a few weeks to get around to it.

Originally the thing passed into requestSession() to indicate the session type was a dictionary. At the time our term for it was exclusive, not immersive, but we did effectively have an immersive: true option. When we started talking about AR the first stab at it was to add another argument to the dictionary (that had some long, awful name like enableEnvironmentIntegration or something similar) that then flipped the AR bit on or off. But then that presented a weird inconsistency, because now we had four possible states but one of them wasn't valid.

  • { immersive: false, AR: false} = inline! Cool!
  • { immersive: true, AR: false} = VR! Awesome!
  • { immersive: true, AR: true} = AR! Sweet!
  • { immersive: false, AR: true} = omgnowhatareyoudoing?!?

Obviously we could easily spec out the fact that that combination was invalid, but we also wondered what we would need to do if that matrix grew any bigger. If a third option was added now the number of type combinations would explode but it was likely that only a small slice of them would be valid.

Additionally we were still talking about how to handle required/optional features at the time, and since the session types were being passed as a dictionary the natural inclination was to just make the features another argument in the dictionary. This posed an ergonomics challenge for supportsSession at the time, though, because we didn't want to require or even allow it to report if a session was supported based on the list of desired other features because of the significant fingerprinting risk it posed. We'd have to spec out which dictionary items "counted" and which didn't when calling supportsSession. And again, that's was feasible but it's not great for developers. It also meant that we would be re-litigating with every new attribute whether it was something that could be used with supportsSession or not, and additionally meant that supportsSession would have forward compat issues with dictionary values that it wasn't aware of yet, just like we ran into with requiredFeatures.

While no single issue above is fatal in and of itself, collectively they made for some significant API ergonomics concerns. After talking it through over the course of several meetings we proposed and eventually accepted the current form: The actual session type and (crucially) the thing you can test with isSessionSupported is a small and tightly controlled list of modes. The modes it exposes are based primarily on the properties of the session that we felt content creators were most likely to differentiate between as well as what was necessary to know at native initialization time. (For example: Various mobile devices use completely different runtimes depending on if you are presenting VR or AR content, so we need to differentiate between those two modes as early as possible.)

Yes, this has caused us a couple of bumps regarding where exactly the enum values live, but ultimately it feels like a less confusing and more stable long-term approach for developers.

Discussed the history above on the WG call and further talked about how AR handheld/headworn sessions should be differentiated. Seems like there's not a desire to revert back to a single immersive mode based on that discussion.