immersive-web / layers

A feature repo for working on multi-layer support in WebXR. Feature leads: Rik Cabanier and Artem Bolgar (Oculus)

Home Page:https://immersive-web.github.io/layers/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Subtitle displays for layers which play video

AdaRoseCannon opened this issue · comments

Subtitles have been discussed a lot in some of the a11y discussions such as in XAUR. Ideally a video with a subtitle track played through layers should display the subtitle in a way which works well and makes sense.

Here is the example mark up for a video with a video track: https://developer.mozilla.org/en-US/docs/Web/Guide/Audio_and_video_delivery/Adding_captions_and_subtitles_to_HTML5_video#html_markup

It would be handy if the subtitles can be positioned by the UA outside of the layer boundary so a video being played off to the side can still be "listened to" using the subtitles.

It would also be handy if the subtitle layer could be used by elements with subtitle tracks.

If they can also be styled with the ::cue pseudo class that would also be very great.

https://developer.mozilla.org/en-US/docs/Web/Guide/Audio_and_video_delivery/Adding_captions_and_subtitles_to_HTML5_video#styling_the_displayed_subtitles

WebXR Layers don't display the native video element only the video stream just like in WebGL.
However, authors can extract the subtitles themselves and position and style them. There are several frameworks that do this for WebGL and regular WebXR.

This really should be a layer to make it as clear and readable as possible rather than just using WebGL. I am not sure if the Oculus OS has subtitles built in but it should be something the UA controls font-szie/positioning to maximise clarity for the user and allow them to have consistent subtitles across web sites.

The author is free to put a layer in front of the video to display the subtitles.
We can't do CSS styling because there is no markup that belongs to the layer. What element would you be styling? The one in the document? That would mean that the same element is rendered in 2 places which is not possible.

With WebXR DOM Layers, this would be possible.

For me styling is a nice to have rather than a core feature.

The must have feature as I can see is a user agent controlled subtitle for sounds which developers can provide the text and hints as to the origin of the sound source.

The user can select what size they want the subtitles how they want the subtitles displayed by the browser since there are a few behaviors that work differently for different people. e.g.

  • Ignore position display below eyeline
  • Ignore position display dead ahead
  • Place directly on top of sound source
  • Place above/below sound source
  • Place above/below sound source when it is in my field of view otherwise place in my peripheral vision.

By providing a browser hook we can encourage developers to do it properly and allow users to have a consistent subtitle experience across many sites. Without needing to configure the required font-size on each individual web page. Or worse a VR site making the font too small with no option to increase the size making them useless.

For me styling is a nice to have rather than a core feature.

The must have feature as I can see is a user agent controlled subtitle for sounds which developers can provide the text and hints as to the origin of the sound source.

The user can select what size they want the subtitles how they want the subtitles displayed by the browser since there are a few behaviors that work differently for different people. e.g.

  • Ignore position display below eyeline
  • Ignore position display dead ahead
  • Place directly on top of sound source
  • Place above/below sound source
  • Place above/below sound source when it is in my field of view otherwise place in my peripheral vision.

As far as I know, there is no support for positional audio in video. It would have to be dealt with separately.

By providing a browser hook we can encourage developers to do it properly and allow users to have a consistent subtitle experience across many sites. Without needing to configure the required font-size on each individual web page. Or worse a VR site making the font too small with no option to increase the size making them useless.

We shouldn't invent something new just for WebXR layers. Authors can create subtitles by themselves or by using existing frameworks and this will work better than some default.
For parity with regular HTML, we can rely on DOM Layers.

I'm not too concerned with positional audio from the video itself. I am not even sure if subtitles files even support that yet. What I am concerned with is developers playing a video with audio from a layer at a position in the environment. For 360 video the subtitles will need to be placed where they could be visible at all times.

As an accessibility feature a default picked by the user for their own needs is better than something made by developers. Text on surfaces in WebGL is a poor reading experience and even if/when we get DOM layers they still won't be able to provide a global setting for subtitle behaviour.

I don't understand where the subtitles would be drawn. Would they be in their own layer?
How would they interact with other layers? Would they always be on top?

Yes, in their own layer, Yes, always on top.

On top of everything, or only the video layer?

Let's discuss this during next week's meeting. Maybe you can write up a concrete proposal? (ie some IDL + compositing behavior)
If people feel strongly that this should be in the initial WebXR Layers spec, I'd be happy to support it and get it implemented in the Oculus browser.

During the weekly call we discussed this issue in more detail.

There seems to be consensus that we should provide support for captions and subtitles and that it would be best if we could tie into the accessibility features of the operating system.

For instance, if there was a system preference to always provide captions, the UA should leverage that during immersive video playback. This would give the user captions in the style of the current OS.

If people think this sounds reasonable, I will add some normative text to the spec that we can discuss in an upcoming call.