SCC > SRT error, Domesday LD Capture: AttributeError: 'NoneType' object has no attribute 'append_text'

Question

SCC > SRT error, Domesday LD Capture: AttributeError: 'NoneType' object has no attribute 'append_text'

rktcc opened this issue a year ago · comments

ttconv 1.0.7 (pip install --pre ttconv)
python 3.11

head.scc.txt (rename from .txt to .scc since Github didn't like .scc.)

This is an SCC file extracted from a LaserDisc film captured using a Domesday Duplicator. Additionally, this is a Japanese language film.

This issue has occurred in the past with other Domesday captures but I used https://github.com/atsampson/ttconv until it stopped working now, and I can't sort out what changes they made before merging the updates to 1.0.7.

Unsupported SCC word: 0x7c                                                  
Unsupported SCC word: 0x7c                                                  
Unsupported SCC word: 0x107c                                                
Reading: |███████-------------------------------------------|  15% CompleteTraceback (most recent call last):
  File "/home/pip/.local/venv/ttconv/bin/tt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/pip/.local/venv/ttconv/lib/python3.11/site-packages/ttconv/tt.py", line 439, in main
    args.func(args)
  File "/home/pip/.local/venv/ttconv/lib/python3.11/site-packages/ttconv/tt.py", line 320, in convert
    model = scc_reader.to_model(file_as_str, reader_config, progress_callback_read)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pip/.local/venv/ttconv/lib/python3.11/site-packages/ttconv/scc/reader.py", line 621, in to_model
    context.process_line(scc_line)
  File "/home/pip/.local/venv/ttconv/lib/python3.11/site-packages/ttconv/scc/reader.py", line 556, in process_line
    self.process_text(word, line.time_code)
  File "/home/pip/.local/venv/ttconv/lib/python3.11/site-packages/ttconv/scc/reader.py", line 460, in process_text
    self.buffered_caption.append_text(word)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'append_text'

I wonder if possibly the capture has errors or is flawed and this is causing the "unsupported characters", or if it's just because Japanese character set is not supported?

thank you

Pierre-Anthony Lemieux · Answer 1 · Tue Aug 15 2023 10:33:17 GMT+0800 (China Standard Time)

@valnoel Can you look at this issue in the context of your work improving the SCC reader?

rktcc · Answer 2 · Wed Aug 16 2023 06:29:07 GMT+0800 (China Standard Time)

It was raised to my attention that it seems the Japanese character sets are not present in the scc codes file, thus I imagine this might be a difficult task to achieve?

It was also noted that the content in the example is "two byte unicode", not sure if that's helpful. Just passing on some information from the Domesday group conversation.

Thanks to the maintainers for assistance!

Valentin NOËL · Answer 3 · Wed Aug 16 2023 17:44:48 GMT+0800 (China Standard Time)

The SCC reader does not currently support Japanese characters, which do not appear in the CEA-608 specification.

It seems an extension was once submitted to the specification, but I don't have any more information about it...

Otherwise, it seems CEA-708 introduces the Unicode characters support, which allow the display of Japanese and other languages.

@palemieux What do you think?

Pierre-Anthony Lemieux · Answer 4 · Thu Aug 17 2023 02:23:30 GMT+0800 (China Standard Time)

Otherwise, it seems CEA-708 introduces the Unicode characters support, which allow the display of Japanese and other languages.

Ok will look at this next week.

Pierre-Anthony Lemieux · Answer 5 · Tue Sep 05 2023 01:27:46 GMT+0800 (China Standard Time)

@rktcc Can you provide a link to the forum discussion thread? I could not find any specification for carrying arbitrary unicode characters in SCC.

rktcc · Answer 6 · Tue Sep 12 2023 09:22:29 GMT+0800 (China Standard Time)

@rktcc Can you provide a link to the forum discussion thread? I could not find any specification for carrying arbitrary unicode characters in SCC.

Hi, I am sorry for the delay.

Here is the discussion on ttconv missing Japanese character sets:

https://discord.com/channels/665557267189334046/676084498097766451/1140876443719577650

I think it's not the encoding and decoding that's wrong, there needs to be EIA-608 support added to ttconv and a way to detect EIA-608
https://github.com/sandflow/ttconv/tree/master/src/main/python/ttconv/scc/codes there's no Japanese character support at all
https://en.m.wikipedia.org/wiki/EIA-608

Here is a thought that the Norpak Non-Western addition may be what's needed...

https://discord.com/channels/665557267189334046/676084498097766451/1141486766579265576

Wikipedia says that there's non-western character support from Norpak https://en.m.wikipedia.org/wiki/EIA-608 under Non-Western Norpak Character Sets

Someone mentions a reference of CEA-608 set 6.4, Table 4, for Asian languages; however only PRC and (South) Korea are mentioned.

https://discord.com/channels/665557267189334046/676084498097766451/1141484827619635240

Referencing 6.4 Character Sets (Normative), 6.4.1 Standard, CEA-608
https://media.discordapp.net/attachments/676084498097766451/1141486499452432464/image.png

There's also a thought that it could be CC/Teletext, however as other subtitle content has been extracted from LaserDiscs using the Domesday, and converted from SCC to plaintext SRT, I would have to guess the Japanese SCC data would be the same, just the character sets missing from ttconv.

Did Japan use CC? ISTR that they had a teletext-like system for magazine-type data - which may also have worked for subtitles/closed-captions? (I know the wikipedia article mentions that two-byte stuff was added to the spec, but could that be like 50Hz being added to ATSC 1.0 - in an attempt to capture markets that didn't happen?) https://en.wikipedia.org/wiki/JTES was the teletext system (CCIR System D?)

I hope this is helpful in some capacity in either closing the ticket due to lack of project support, or adding some kind of additional processing.

If more info is needed I can look more. The Discord is free to join, sadly this is not hosted on an actual forum. Alternatively the general chat can be joined from IRC, on channel #domesday86 on https://libera.chat IRC network; you would not need to sign up for Discord in that case as a bot hands messages each way.

Discord Invite: https://github.com/happycube/ld-decode#documentation

Thank you again

Pierre-Anthony Lemieux · Answer 7 · Tue Sep 12 2023 12:22:13 GMT+0800 (China Standard Time)

I have joined the discord server.

In the meantime, I have spent some quality time staring at the sample file and it does not look like CEA 608 at all, e.g.:

Is that noise/errors from the laserdisc capture? Could it be something totally different like bitmaps?