[BUG] MP4 box name as random video data at beginning of TS-recording trips format-detection

Question

[BUG] MP4 box name as random video data at beginning of TS-recording trips format-detection

hurda opened this issue 2 years ago · comments

hurda commented 2 years ago

CCExtractor version: 0.94

In raising this issue, I confirm the following:

I have read and understood the contributors guide.
I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
I have checked that the issue I'm posting isn't already reported.
I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
I have used the latest available version of CCExtractor to verify this issue exists.

Necessary information

Is this a regression (i.e. did it work before)? NO
What platform did you use? WINDOWS
What were the used arguments? ccextractorwinfull.exe -autoprogram -out=srt -bom -utf8 file.ts

Video links

Is one needed?

Additional information

After running countless of DVB-recordings through ccextractor to the subtitles from the teletext, this file was the first to not getting processed at all, instead I got this console-output:

>ccextractorwinfull.exe -out=srt -bom -utf8 file.ts
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: K:\file.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: K:\file.ts
Detected MP4 box with name: moov
File seems to be a MP4
Analyzing data with GPAC (MP4 library)
Opening 'K:\file.ts': ←[31m[iso file] Incomplete box 0000B00D - start 0 size 479044969
←[0m←[31m[iso file] Incomplete file while reading for dump - aborting parsing
←[0mFailed to open input file (gf_isom_open() returned error)


Total frames time:        00:00:00:000  (0 frames at 29,97fps)
Done, processing time = 0 seconds

Forcing the input-file-format with -in=ts worked and the subtitle was created successfully, but I wanted to get down to the cause of the problem.

After going through the source and checking how the format-detection works, I saw that CCE is checking the video for certain strings to determine the format, at least that's how I understood it.
I opened the TS-file in a hex-editor and searched for moov:

Position 727131 0xB185B

Luckily that was a payload-only TS-packet of the video-PID, so I was free to just change the text to something else.
Then I ran this modified file through ccextractor, which worked:

>ccextractorwinfull.exe -autoprogram -out=srt -bom -utf8 file_edit.ts
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: K:\file_edit.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: K:\file_edit.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
VBI/teletext stream ID 2701 (0xa8d) for SID 2004 (0x7d4)
- Programme Identification Data = ProSieben.at
- Universal Time Co-ordinated = Tue Mar  8 15:33:44 2022
Notice: Teletext page with possible subtitles detected: 149
- No teletext page specified, first received suitable page is 149, not guaranteed
100%  |  34:00
Teletext decoder: 51004 packets processed

Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0

Min PTS:                                25:29:18:443
Max PTS:                                26:03:18:563
Length:                          00:34:00:120
Done, processing time = 4 seconds

Willem · Answer 1 · Wed Mar 09 2022 15:23:53 GMT+0800 (China Standard Time)

Well, it's the first time in about 7 years we see this kind of issue (code for this was added by me in #165), so that sample would definitely be welcome.

Looks like my approach back then wasn't fully bulletproof.

hurda · Answer 2 · Wed Mar 09 2022 19:00:41 GMT+0800 (China Standard Time)

Trimmed the file to the first megabyte, as that's what the format-autodetection is looking at, right?
https://www.mediafire.com/file/xt5s9pd6yj3hc4q/ccextractor_moov_ts.zip/file
Contains the original file and the edited version.

Willem · Answer 3 · Wed Mar 09 2022 20:48:26 GMT+0800 (China Standard Time)

Yes, that should be sufficient indeed. Thanks for the quick share.

Carlos Fernandez Sanz · Answer 4 · Wed Mar 22 2023 05:04:17 GMT+0800 (China Standard Time)

Closing since it seems fixed already (at least based on the merge, I haven't validated)