videojs / mux.js

Lightweight utilities for inspecting and manipulating video container formats.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Garbled CEA captions

joeyparrish opened this issue · comments

In shaka-project/shaka-player#2395, we received a report of garbled CEA captions in Shaka Player. We do not know what is causing it, but we can reproduce the issue with mux.js in a standalone node script which is very similar in structure to how we use mux.js in Shaka Player:

const muxjs = require('mux.js');
const fs = require('fs');

const CaptionParser = class {
  constructor() {
    this.muxCaptionParser_ = new muxjs.mp4.CaptionParser();
    this.videoTrackIds_ = [];
    this.timescales_ = {};
  }

  parseInitSegment(data) {
    this.videoTrackIds_ = muxjs.mp4.probe.videoTrackIds(data);
    this.timescales_ = muxjs.mp4.probe.timescale(data);
    this.muxCaptionParser_.init();
  }

  parseMediaSegment(data) {
    const parsed = this.muxCaptionParser_.parse(
        data, this.videoTrackIds_, this.timescales_);
    const captions = parsed && parsed.captions ? parsed.captions : [];
    this.muxCaptionParser_.clearParsedCaptions();
    return captions;
  }
};

function readFile(path) {
  return new Uint8Array(fs.readFileSync(path));
}

// argv[0] is the name of the interpreter
// argv[1] is the name of this script
if (process.argv.length < 4) {
  console.log('Usage: ' + process.argv[0] + ' ' + process.argv[1] +
              '<INIT_SEGMENT> <MEDIA_SEGMENT> [<MEDIA_SEGMENT> ...]');
  process.exit(0);
}

const initSegmentPath = process.argv[2];
const mediaSegmentPaths = process.argv.slice(3);

const initSegment = readFile(initSegmentPath);
console.log('Init segment:', initSegmentPath, initSegment.length + ' bytes');

const p = new CaptionParser();
p.parseInitSegment(initSegment);

for (const path of mediaSegmentPaths) {
  const segment = readFile(path);
  console.log('Media segment:', path, segment.length + ' bytes');

  for (const caption of p.parseMediaSegment(segment)) {
    console.log(caption);
  }
}

The output is:

{ startPts: 4296348676,
  endPts: 4296519847,
  text: 'e  iuc\nri Mm,eneadiouR-- -- <i>Hetageuseu</i>',
  stream: 'CC1',
  startTime: 47737.20751111111,
  endTime: 47739.10941111111 }

That text is supposed to be English, though I don't have a working parser for comparison to say exactly what that particular piece of text is meant to be. We get the same results for both encrypted and clear versions of the content, so we know that the encryption is not being applied to the CC parts of the segment.

I've asked permission to share the init segment and one encrypted media segment with you, and I will follow up with those as soon as I have permission.

The segments are attached.

CEA_segments.zip

Thanks!

@gesinger, @gkatsev, please let me know if there's anything else we can do to help you debug this. Thanks so much!

I'll take a look tomorrow.

I looked into it a bit today and it seems like everything is working as expected. Unfortunately, we don't really have many 608 experts anymore, so, any help you can provide would be helpful.
Unfortunately, it seems like there aren't many tools that help with 608 or aren't really maintained anymore. I've tried to see what other parsers would do with this segment but couldn't get any others to work. Even ccextractor returned nothing.
We'd appreciate any help you're able to provide and we'll continue investigating as well.

I've tried to see what other parsers would do with this segment but couldn't get any others to work. Even ccextractor returned nothing.
We'd appreciate any help you're able to provide and we'll continue investigating as well.

@ppatlolla-turner, since this content came from you, can you offer any other information to help with this investigation?

One thought that @ldayananda had is that maybe we're not calculating the PTS/DTS times properly for these captions.

Also, would it be possible to get a clear segment with the garbled captions?

One thing we noticed is that the segment has a lot of b-frames and unfortunately, we don't support b-frames with the 608/708 captions, though, we should #214.

Yes our streams do have b-frames.

I've been digging into this a bit more, and I find that it's not completely garbled. If I look at a different range of segments and log the CEA character pairs from mux.js, it becomes apparent that some are missing. For example, this caption output from mux.js:

"Lioln d nodo h homork theack a svel"

Corresponds to the spoke line:

"Lincoln did not do his homework on the back of a shovel"

Several CEA character pairs are just plain missing.

When I take the same content and run it through FFmpeg to remove bframes and Shaka Packager to re-fragment it, I find that the text is correctly parsed in mux.js. The segment I posted above, which results in:

"e iuc\nri Mm,eneadiouR-- -- Hetageuseu"

Becomes:

"DOCENT TRAINER:\nHere at the American\nHeritage Museum,"

So this does seem related to bframes in the content.

What would it take to support bframes correctly?

A colleague has just pointed this out to me:

        if (sampleCompositionTimeOffsetPresent) {
          // Note: this should be a signed int if version is 1
          sample.compositionTimeOffset = view.getUint32(offset);
          offset += 4;
        }

He says that the content in this issue has v1 TRUN boxes and some negative offsets. It's possible that my re-encoding of the content to remove b-frames may have coincidentally changed the TRUN boxes, too. So it may not be directly caused by b-frames at all.

We will look and get back if our packaging is incorrect as indicated above.

I think I have a very simple fix. I'm now able to parse the content from @ppatlolla-turner. It looks like it's the trun box parser in mux.js. I'll send a PR shortly.

Thanks @joeyparrish
Really appreciate the effort.

We're always happy to help. Thanks to the mux.js team for feedback and review.