CCExtractor / ccextractor

CCExtractor - Official version maintained by the core team

Home Page:https://www.ccextractor.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] MP4 box name as random video data at beginning of TS-recording trips format-detection

hurda opened this issue · comments

commented

CCExtractor version: 0.94

In raising this issue, I confirm the following:

  • I have read and understood the contributors guide.
  • I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • I have checked that the issue I'm posting isn't already reported.
  • I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • I have used the latest available version of CCExtractor to verify this issue exists.

Necessary information

  • Is this a regression (i.e. did it work before)? NO
  • What platform did you use? WINDOWS
  • What were the used arguments? ccextractorwinfull.exe -autoprogram -out=srt -bom -utf8 file.ts

Video links

  • Is one needed?

Additional information

After running countless of DVB-recordings through ccextractor to the subtitles from the teletext, this file was the first to not getting processed at all, instead I got this console-output:

>ccextractorwinfull.exe -out=srt -bom -utf8 file.ts
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: K:\file.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: K:\file.ts
Detected MP4 box with name: moov
File seems to be a MP4
Analyzing data with GPAC (MP4 library)
Opening 'K:\file.ts': ←[31m[iso file] Incomplete box 0000B00D - start 0 size 479044969
←[0m←[31m[iso file] Incomplete file while reading for dump - aborting parsing
←[0mFailed to open input file (gf_isom_open() returned error)


Total frames time:        00:00:00:000  (0 frames at 29,97fps)
Done, processing time = 0 seconds

Forcing the input-file-format with -in=ts worked and the subtitle was created successfully, but I wanted to get down to the cause of the problem.

After going through the source and checking how the format-detection works, I saw that CCE is checking the video for certain strings to determine the format, at least that's how I understood it.
I opened the TS-file in a hex-editor and searched for moov:
ccextractor_moov
Position 727131 0xB185B

Luckily that was a payload-only TS-packet of the video-PID, so I was free to just change the text to something else.
Then I ran this modified file through ccextractor, which worked:

>ccextractorwinfull.exe -autoprogram -out=srt -bom -utf8 file_edit.ts
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: K:\file_edit.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: K:\file_edit.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
VBI/teletext stream ID 2701 (0xa8d) for SID 2004 (0x7d4)
- Programme Identification Data = ProSieben.at
- Universal Time Co-ordinated = Tue Mar  8 15:33:44 2022
Notice: Teletext page with possible subtitles detected: 149
- No teletext page specified, first received suitable page is 149, not guaranteed
100%  |  34:00
Teletext decoder: 51004 packets processed

Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0

Min PTS:                                25:29:18:443
Max PTS:                                26:03:18:563
Length:                          00:34:00:120
Done, processing time = 4 seconds

Well, it's the first time in about 7 years we see this kind of issue (code for this was added by me in #165), so that sample would definitely be welcome.

Looks like my approach back then wasn't fully bulletproof.

commented

Trimmed the file to the first megabyte, as that's what the format-autodetection is looking at, right?
https://www.mediafire.com/file/xt5s9pd6yj3hc4q/ccextractor_moov_ts.zip/file
Contains the original file and the edited version.

Yes, that should be sufficient indeed. Thanks for the quick share.

Closing since it seems fixed already (at least based on the merge, I haven't validated)