SSML file not processing under --ssml flag

Question

SSML file not processing under --ssml flag

PeterSprague opened this issue 3 years ago · comments

Testing both Larynx and Larynx.server install via pip3 in a venv. All dependencies are satisfied. Fedora 34 all up to date.

Using the example SSML in a file TTS-SSML_test.txt:
larynx.server --> input contents of file into input box and run. SSML checkbox unchecked or checked = voice recognizing ssml cmds and not reading them

Using larynx from cmd line:
$ python3 -m larynx -v southern_english_female-glow_tts < TTS-SSML_test.txt
reads whole file including all the SSML statements

$ python3 -m larynx --ssml -v southern_english_female-glow_tts < TTS-SSML_test.txt
errors:
Traceback (most recent call last):
File "/TextToSpeech/venv/lib64/python3.9/site-packages/gruut/text_processor.py", line 479, in process
root_element = etree.fromstring(text)
File "/usr/lib64/python3.9/xml/etree/ElementTree.py", line 1348, in XML
return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 1, column 7

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib64/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/TextToSpeech/venv/lib64/python3.9/site-packages/larynx/main.py", line 720, in
main()
File "/TextToSpeech/venv/lib64/python3.9/site-packages/larynx/main.py", line 294, in main
for result_idx, result in enumerate(tts_results):
File "/TextToSpeech/venv/lib64/python3.9/site-packages/larynx/init.py", line 71, in text_to_speech
for sentence in gruut.sentences(
File "/TextToSpeech/venv/lib64/python3.9/site-packages/gruut/init.py", line 79, in sentences
graph, root = text_processor(text, lang=lang, ssml=ssml, **process_args)
File "/TextToSpeech/venv/lib64/python3.9/site-packages/gruut/text_processor.py", line 432, in call
return self.process(*args, **kwargs)
File "/TextToSpeech/venv/lib64/python3.9/site-packages/gruut/text_processor.py", line 483, in process
root_element = etree.fromstring(f"{text}")
File "/usr/lib64/python3.9/xml/etree/ElementTree.py", line 1348, in XML
return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 1, column 22

Also tried piping the file in via cat:
cat TTS-SSML_test.txt | python3 -m larynx --ssml -v southern_english_female-glow_tts
Same error
Produces audio file without the --ssml flag, but as above includes all the SSML statements

Been through the documentation page and tried the examples to narrow this down. There is nothing specific to using a SSML specific file to produce the audio. Non-SSML examples all work on my workstation

Would like to get this working for a small project that produces training audio files of Shorin-Ryu Karate Yakusokus for my Black belt test practice

Thanks,

Michael Hansen · Answer 1 · Wed Nov 03 2021 04:02:43 GMT+0800 (China Standard Time)

Hi @PeterSprague, thanks for trying out Larynx 🙂

Can you post an example of your SSML? I can't seem to reproduce the issue on my machine. Maybe I have something wrong off my SSML parser.

PSprague · Answer 2 · Wed Nov 03 2021 04:19:10 GMT+0800 (China Standard Time)

I directly copied your SSML example from the README:
TTS-SSML_test.txt

$ python3 -m larynx --ssml -v southern_english_female-glow_tts < TTS-SSML_test.txt

Michael Hansen · Answer 3 · Wed Nov 03 2021 21:42:02 GMT+0800 (China Standard Time)

OK, I see what's happening now. The command-line interface for Larynx is line-based -- it assumes each line is an individual utterance. If you remove the newline characters, it should work fine.

I may need to consider if --ssml should imply reading the entire input as one utterance, or of some other flag should indicate this.

PSprague · Answer 4 · Thu Nov 04 2021 04:07:07 GMT+0800 (China Standard Time)

remove the newline characters

I'm missing something here. Are you saying to create a mixed blob of text and ssml cmds? How is that even decipherable by a human writer once the file gets more than a few "sentences"?

Here is a copy of my espeak-ng ssml file that is working well. Other than voice name this should also be able to be read by Larynx
Yakusoku-6_attacker_detail_TTS-SSML-Espeak.txt

$ espeak-ng -f Yakusoku-6_attacker_detail_TTS-SSML-Espeak.txt -s 150 -p 50 -l 30 -k20 -m

Michael Hansen · Answer 5 · Thu Nov 04 2021 04:18:07 GMT+0800 (China Standard Time)

No, I'm suggesting something like this as a workaround:

tr < Yakusoku-6_attacker_detail_TTS-SSML-Espeak.txt '\n' ' ' | bin/larynx --ssml -v en-us

If the input all goes on one line into Larynx, it will be read correctly. This is intended to allow multiple sets of sentences to come in, like:

<speak>1st set of sentences</speak>
<speak>Next set of sentences</speak>
...

but I think with SSML, people will expect it to read the entire input at once.

PSprague · Answer 6 · Thu Nov 04 2021 04:40:48 GMT+0800 (China Standard Time)

OK, stripping the newline as it "reads the file.

$ tr < Yakusoku-6_attacker_detail_TTS-SSML-Espeak.txt '\n' ' ' | python3 -m larynx --ssml -v en-us

Works well, thanks

When do you think you will be adding to the SSML set to give increased control over the delivery?

Michael Hansen · Answer 7 · Thu Nov 04 2021 21:10:52 GMT+0800 (China Standard Time)

What sorts of SSML tags do you think would be most useful?

PSprague · Answer 8 · Fri Nov 05 2021 00:35:30 GMT+0800 (China Standard Time)

TTS and SSML very new to me, with my background being more on computer-vision and DL to assess ecological impacts.

I guess it really comes back to interests and/or business case. Are you wanting to create a self-hosted TTS solution using ML technigues to provide alternatives to Azure or Google? Then I would follow their sub-sets of SSML. Otherwise if wanting to use for more specific cases, then honing the sub-set to what enhances that usage might be the preferred development direction.

For my usage, based on https://www.w3.org/TR/speech-synthesis11/#S3.2, I think having control of the voice characteristics via "3.2.4 prosody Element" would be good.

Michael Hansen · Answer 9 · Fri Nov 12 2021 05:46:49 GMT+0800 (China Standard Time)

Fixed the --ssml input mode in Larynx 1.1 (it now reads the entire input).

Regarding prosody, I can control the rate and volume with GlowTTS (Larynx's TTS model), but pitch and contour aren't something that can be changed in the model.