syndicated-media / sn-spec

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Transcripts

farski opened this issue · comments

Should support linking to an external document, to keep feed sizes down

What are the existing standards?

mrss covers subtitle http://www.rssboard.org/media-rss#media-subtitle http://www.rssboard.org/media-rss#media-text
It has basically no adoption but I don't think it should be re done into another spec.

What are the existing standards?

A good summary of choices: https://en.wikipedia.org/wiki/Timed_text

It has basically no adoption but I don't think it should be re done into another spec.

One Very Good Thing™ to do would be to act as a neutral but opinionated body on which existing RSS standards should be supported. Media RSS has some very nice ideas, but the well-intentioned folks behind it left its nurturing to fate. That's not how you bootstrap a community standard.

Like the idea of including a transcript as a separate link. Also like encouraging the formatting of that transcript using one of the standards from https://en.wikipedia.org/wiki/Timed_text that @CharlesWiltgen mentioned. However, using timed text presents similar challenges discussed in other issue of ad insertion and keeping the transcript in sync with the latest audio file it's pointed at.

However, using timed text presents similar challenges discussed in other issue of ad insertion and keeping the transcript in sync with the latest audio file it's pointed at.

On the bright side, we're standing on the shoulders of giants (ffmpeg, AV Foundation, etc.) that are really good at this kind of EDL-like manipulation. Even for folks that have to roll their own (web apps, maybe?), it's conceptually straightforward — for example, if a 0:10 ad is inserted at 2:30, any events that happen at or after 2:30 are simply offset by 0:10. Not saying it wouldn't be a PITA. 🙂

I would like to point the discussion to WebVTT.
In an RSS feed, one could link to an external WebVTT file which includes the transcript - for example:
<atom:link rel="transcript" href="http://example.org/transcript.vtt">
On the podcast webpage, the WebVTT file can be added as a track element in the audio/video element. This is supported by all major browsers.

WebVTT is an existing spec with lots of possible time-based features (not only the text, also speaker names, styling or any other custom data like GPS coordinates etc.) and quite some systems support it already (screenreaders, (web) audio players with WebVTT display+search, software libs, etc.).
Search engines could also easily parse WebVTT files in an audio/video tag, then we have searchable audio ;)

Once again, I think as long as the metadata is tightly bound to the media file (e.g. through use of an id3 tag or link header) we can do whatever. WebVTT is definitely the obvious choice though I think it remains to be seen whether or not timed text is an important feature of these transcripts - I think if we make it a requirement, we may reduce the level of participation. If we don't make timed text a requirement then the data can be encoded directly in the feed.

I think if we make it a requirement, we may reduce the level of participation.

Agreed, transcripts (untimed) and subtitles/captions (timed) are both useful. I'd like to see both defined in the same way that MediaRSS sort of[1] does with media:text and media:subTitle.

FWIW I like WebVTT as the timed text format. It's supported in 82% of browsers in use worldwide and 96% in the USA, and there are apparently polyfills available for older browsers. The only potential downside is that I don't see any native iOS or Android parsers (iOS has one, but it only appears to work in HLS contexts) so that might create a bit of a chicken/egg problem initially.

For the transcript format, it sure would be nice to be able to use Markdown (.md). If full HTML is supported, I think it's likely that enterprising people will use this for all kinds of things that go well beyond the intent.

[1] IMO they didn't quite nail it because both can be used for timed-text. People consuming the spec shouldn't be wondering which to use when.