immersive-web / proposals

Initial proposals for future Immersive Web work (see README)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New proposal ideas?

AdaRoseCannon opened this issue · comments

/tpac Blue skies thinking. Don't be constrained by the limits of feasibility and common sense.

What do you want the future of the Immersive Web to look like?

Related: immersive-web/administrivia#167. We can probably roll it into this agenda item.

This might be better on Friday after people have had an opportunity to review and think about things.

Potential new work: Supporting 3D screen based hardware:
Not much is needed just need to inform developer so they can adapt content accordingly.

Is there any interest in supporting audio?
Input: audio detected and processed from the user's microphone
Output: Spatialized and other browser filters

Haptics: Is there a need to control devices (output, feedback) or take inputs (control). This may be covered under GamePad WG or other specs.

@AdaRoseCannon Has eye tracking been discussed lately? Seems like the last discussion was back in 2018? I'm guessing that eye and face tracking will hit "mass-market" devices in the next few years (see e.g. some recent speculation based on Oculus firmware). There'll obviously be a lot of use cases that require fine-grained face tracking - e.g. to essentially map all the subtle (but important) expressions that humans can make onto their virtual avatars. A related use case is mapping the user's actual face into a shared AR environment and so that others see their face without a headset covering it.

@AdaRoseCannon Has eye tracking been discussed lately? Seems like the last discussion was back in 2018?

I see that there is an OpenXR extension although it just tracks what point in space the user is focusing on. Is this what you're looking for or do you want to render where the user is looking at, blinking, etc?

@cabanier Eye tracking would be a good start, but my particular use case needs full face tracking - to map facial expressions onto the user's avatar. As mentioned, I think this will be a common requirement for communication-focussed applications. I'm guessing we'll have to wait for face tracking to hit some consumer headsets before attempting to begin specifying something for the web?

I'm curious how Oculus is going to expose the parameters of the user's face. iOS's ARKit approach apparently exposes 52 parameters/micro-expressions. My main hope is that the eventually-proposed spec for the web has enough parameters to escape the "uncanny valley", since there's a "long tail" of subtle and unusual facial expressions that humans are capable of making, and I think we tend to under-appreciate these until we see some CGI where they're missing.

The v2 of the above-linked Blender plugin apparently allows for 600+ different parameters, which seems closer to what's required, going by their demo animations. This 2017 study does okay with 100 expression parameters, but I think it's still firmly in uncanny/stiff territory if you look at the video examples compared to the actual 3D scans.

It seems like this whole area is still a somewhat "open" problem - so maybe not a good time for the web to be trying to standardize anything. On the other hand, if the first WebXR API in this area were "low-level" enough, then it's unlikely that it would become outdated. Once popular libraries/patterns emerge, new APIs could be developed which are more convenient and performant (a la the extensible web approach). The question would then be: Is there a hypothetical low-level API which is unlikely to become outdated, and which satisfies the constraints of the web (privacy considerations, cross-device compatability, etc.).

Lower-level sensor APIs require more care around permissions, since the data is naturally more sensitive, but this is not fundamentally new territory for the web in that regard - the camera permission, for example, is an uncontroversial precedent.

Are you aware of any public APIs that expose this type of information?

@cabanier Do you mean like the above-mentioned iOS ARKit? As mentioned, that API exposes 52 blend shapes. Android's ARCore has the Augmented Face API exposes a face mesh as you can see here. Snapchat's Lens Studio seems to expose a more simple mesh-like structure of 93 points. I don't think any of these would be sufficient for properly reproducing detailed face expressions - they seem more designed for adding effects like sunglasses, hats, lipstick, tattoos, etc. rather than reproducing subtle face expressions.

If your question was more aimed at headsets: AFAIK there are currently only a few lesser-known consumer headsets that support face and eye tracking - but I'm guessing it'll be standard in the next generation of headsets (crossing my fingers for the Oculus announcement on Thursday). The VIVE Facial Tracker (demo video here), announced earlier this year, exposes 38 parameters/blend-shapes of the mouth/chin/jaw, and can apparently be connected to most PC VR headsets.

Edit: Facebook/Meta announced yesterday that they'll be selling a device in 2022 with full face tracking: https://youtu.be/WBl1qMbHs9s?t=46