immersive-web / proposals

Initial proposals for future Immersive Web work (see README)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Common functionality for imperative AR

ddorwin opened this issue · comments

Summary

There are a number of items related to imperative AR that need to be worked out and defined that are independent of specific AR APIs (e.g., hit-test, Real World Geometry, anchors, camera access). While the CG has been incubating those specific APIs, there is no central place for these common dependencies. Rather than picking a single API’s incubation repository, we propose to create a common repository where we can work out the details independent of any single other incubation.

The work in this repository should feed into the core WebXR Device API spec and be a prerequisite for exposing specific AR APIs.

The repository name might be something like ar-common/ or imperative-ar-common/.

Example topics

Potential topics include:

  • Fleshing out “AR mode,” especially the normative requirements and guidelines to ensure interoperability, consistent behavior for developers, and a safe and secure experience for users. Entering and exiting, consent, etc.
  • Compositing, including rendering WebGL on top of the camera feed (without exposing the latter) and any considerations related to see-through or other types of displays?
  • How is initialization managed and how does the application know when the necessary RWU data is available? See also the next two items.
  • RWU initialization instructions:
    • Smartphone AR apps often have instructions and/or UX that demonstrates how to move the phone around to find planes, etc. Similarly, Hololens and Magic Leap instruct the user to scan the room.
    • Similar to headset activation instructions (immersive-web/webxr#413), developers shouldn’t need to provide instructions for every form factor, initialization mechanism, or platform.
      • The necessary information to provide accurate instructions might not even be exposed.
      • Instructions written to one device (or set of devices) may not extend to other/new devices or even iterations of the supported devices.
      • Even seemingly similar devices might have nuances that require slightly different actions.
      • The user agent or platform might have more information and be able to provide better feedback to the user. For example, "back up" or "slow down."
    • There might also be scenarios where real world data is loaded to accelerate startup and/or consent is required to use previously acquired data.
    • Are there different instructions for different use cases?
      • For example, placing furniture (floor plane) might need different actions than hanging art on the wall or looking for images on the wall, and both might be different from outdoor/open space use cases. A really simple use case like placing an object floating in front of a user might not require any initialization.
      • How can the application and user agent provide the best experience?
      • How can we abstract this in a cross-platform way?
    • If the user agent is responsible for this initialization, how does the application know when the session is sufficiently initialized? (See also the next item.) Is the requestSession() promise blocked until that point?
    • The user agent / platform may have different needs depending on whether AR was previously used (on this page or another). (Note: There are multiple privacy issues related to this, including exposing data obtained on previous pages and exposing whether AR had been previously used based on timing around initialization.)
    • Presumably, no pose information is exposed to the application during this process.
  • Exposing confidence and readiness
    • AR systems often gain knowledge and confidence over time. They may also make best effort guesses, especially in a new environment.
    • Especially for smartphone AR, the runtime initially has a very crude understanding initially, possibly without any planes. Applications may want to know this.
    • [How] should confidence be exposed to the application?
      • Should an application be able to request data at various confidence levels? For example, “give me a floor plane as quickly as possible” or “only return RWG for which there is high confidence.
    • This is related but not limited to initialization - see the previous item.
  • Tracking failure
    • Behavior
    • Whether and how to report it (i.e., reasons)
  • Consideration of non-graphics use cases (i.e., audio-only AR, accessibility use cases). For example, usage of real world understanding and perception capabilities without necessarily wanting to use that to render graphics or even UI. Do assumptions about “AR mode” break down and how can such issues be addressed?
  • Accounting for various form factors (i.e., smartphones vs. HMDs vs. external headsets).
  • Is some sort of feature requesting or / configuration mechanism necessary?
    • If so, iterate with the core WebXR Device API effort on this topic.
    • Some features affect battery life.
    • SDK modes: Some SDKs require configuration for various modes (image tracking, face tracking, etc.). [How] should this be handled?
  • Are there additional things to consider for “multi-room” or “world-scale” scenarios?
  • Exploring various other configurations, such as:
    • Frame rates (i.e., some AR SDKs run at a lower framerate than the screen and it may be configurable).
    • Focus modes
  • Privacy considerations that account for AR (last but not least!)
    • immersive-web/webxr#638 focuses on core functionality, inline, and immersive VR. We need a similar design doc (or patch) that addresses AR.
    • Ensuring camera frames are not indirectly exposed in implementations. (Use cases that need camera and other data should use explicit APIs.)

here here. Let's create this repo!

There's an agenda item on tomorrow's call to discuss the new modules in which we're working and I'll tack this topic on to that.

I suspect that the majority of the near-term work mentioned above will happen in their respective modules and then we'll continue to work in proposals for features that are farther ahead in terms of shipping hardware or consensus.

I agree that there are commonalities that are foundational to how specific world understanding APIs should be designed (separate from the designs themselves) and perhaps we need a place for that discussion similar to how we use the privacy and security repo to discuss, distill, and share those topics.

That said, we haven't seen much cross-org collaboration in that and other feature repos (all PRs from one org, the great majority of Issues from one org, etc) but we do see more of that in proposals, so that will be a part of the discussion.

After recent discussions it seems pretty clear (as David suggested) that we have a number of foundational AR discussions before us so I'm in support of this proposal.

David's suggestion of using the name ar-common works for me so unless I hear objections in the meantime I'll go ahead and create the repo on Wednesday.

Let's wait to see how tomorrow's call works out. I think a bunch of these things will end up in the AR core module, and some are already decided, and some should be incubations (e.g. some of these might go in performance repo)

Taking into account Tuesday's call I've created a new ar-common repo for discussion and communication of cross-API ideas related to augmented reality.