CCExtractor / ccextractor

CCExtractor - Official version maintained by the core team

Home Page:https://www.ccextractor.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Incorrect placement of X-TIMESTAMP-MAP in WebVTT

bbgdzxng1 opened this issue · comments

Summary

When ccextractor is generating webVTT output, X-TIMESTAMP-MAP does not immediately follow the WEBVTT header.

tl;dr: Propose that emkman99@12b9f93 is reverted.

Reference

According to Roger Pantos, the author of the HLS RFC...

"To be clear, what HLS expects (and what the VTT spec defined prior to that 2016 change) is for the X-TIMESTAMP-MAP line to be among a set of non-blank lines immediately after the WEBVTT header line, followed by two or more line terminators, followed by the rest of the body."

Here is Roger's full statement, clarifying the expected behavior https://mailarchive.ietf.org/arch/msg/hls-interest/4vmLpEsV-EnmkEwMQZkzbGQai_4/ clarifying the background around the . Roger is the authoritative reference on HLS RFC8216.

See also:
w3c/webvtt.js#38
w3c/webvtt#485

Expected Behavior

Note that there is no blank line after the WEBVTT statement.

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:5785169281,LOCAL:00:00:00.000

00:00:06.640 --> 00:00:08.307 line:79.33%
    GEICO has a long history    

00:00:06.640 --> 00:00:08.307 line:84.66%
        of great savings        

00:00:08.342 --> 00:00:09.208 line:84.66%
       and great service.

Current Behavior

WEBVTT

X-TIMESTAMP-MAP=MPEGTS:5785169281,LOCAL:00:00:00.000

00:00:06.640 --> 00:00:08.307 line:79.33%
    GEICO has a long history    

00:00:06.640 --> 00:00:08.307 line:84.66%
        of great savings        

00:00:08.342 --> 00:00:09.208 line:84.66%
       and great service.

Command to replicate...

$ ccextractor "./CNN.ts" -in='ts' -1 -out='vtt' -stdout | head -n10

Where CNN.ts is taken from CNN.ts from "US TV recordings, 10 minutes samples, HDHomeRun" located at https://ccextractor.org/public/general/tvsamples/.

% ccextractor --version
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
CCExtractor detailed version info
        Version: 0.94
        Git commit: Unknown
        Compilation date: 2021-12-15
        CEA-708 decoder: C
        File SHA256: Could not open file

Details

Here is the pull request where the regression occurred.
https://github.com/CCExtractor/ccextractor/pull/1332/files
emkman99@12b9f93

I'm confident that @emkman99's PR was well-intended, however, the link above from Pantos confirms the expected behavior, with absolute authority. @emkman99 - I hope that the snippet from Pantos is helpful.

Conclusion

emkman99@12b9f93 should be reverted to ensure that WebVTT output aligns with HLS RFC.

[ My personal view is that ccextractor should not generate X-TIMESTAMP-MAP by default, but it should be enabled through a --timestamp-map option, but that is a subjective opinion would be a change of functionality. I have tried to limit the bug report to an objective clarification of the standards, quoting the author of the HLS RFC.]

Thanks - I hope this is not a contentious topic.

Closing the end-user facing ticket, because of the awesome work included in #1464 will now track it. You guys don't want open tickets hanging around.

As ever, many thanks @emkman99 for the very sensible enhancement and @cfsmp3 for the project.