Double punctiation break phonemization
cfrancesco opened this issue · comments
Francesco commented
I do not have an extensive list, but many double punctuation patterns break the phonemization. One example being !'
Phonemizer from pip version 2.2
~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/phonemize.py in phonemize(text, language, backend, separator, strip, preserve_punctuation, punctuation_marks, with_stress, language_switch, njobs, logger)
172 # phonemize the input text
173 return phonemizer.phonemize(
--> 174 text, separator=separator, strip=strip, njobs=njobs)
~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/backend/espeak.py in phonemize(self, text, separator, strip, njobs)
233 # finally restore the punctuation
234 return self._phonemize_postprocess(
--> 235 text, text_type, punctuation_marks)
236
237 def _command(self, fname):
~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/backend/base.py in _phonemize_postprocess(self, text, text_type, punctuation_marks)
138 # restore the punctuation is asked for
139 if self.preserve_punctuation:
--> 140 text = self._punctuator.restore(text, punctuation_marks)
141
142 # output the result formatted as a string or a list of strings
~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in restore(cls, text, marks)
147
148 """
--> 149 return cls._restore_aux(str2list(text), marks, 0)
150
151 @classmethod
~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in _restore_aux(cls, text, marks, num)
162 if current.position == 'E':
163 return [text[0] + current.mark] + cls._restore_aux(
--> 164 text[1:], marks[1:], num + 1)
165 if current.position == 'A':
166 return [current.mark] + cls._restore_aux(
~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in _restore_aux(cls, text, marks, num)
175 restored = cls._restore_aux(
176 [text[0] + current.mark + text[1]] + text[2:],
--> 177 marks[1:], num)
178 return restored
179 else:
~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in _restore_aux(cls, text, marks, num)
178 return restored
179 else:
--> 180 return [text[0]] + cls._restore_aux(text[1:], marks, num + 1)
~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in _restore_aux(cls, text, marks, num)
162 if current.position == 'E':
163 return [text[0] + current.mark] + cls._restore_aux(
--> 164 text[1:], marks[1:], num + 1)
165 if current.position == 'A':
166 return [current.mark] + cls._restore_aux(
~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in _restore_aux(cls, text, marks, num)
162 if current.position == 'E':
163 return [text[0] + current.mark] + cls._restore_aux(
--> 164 text[1:], marks[1:], num + 1)
165 if current.position == 'A':
166 return [current.mark] + cls._restore_aux(
~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in _restore_aux(cls, text, marks, num)
161 [current.mark + text[0]] + text[1:], marks[1:], num)
162 if current.position == 'E':
--> 163 return [text[0] + current.mark] + cls._restore_aux(
164 text[1:], marks[1:], num + 1)
165 if current.position == 'A':
IndexError: list index out of range
Mathieu Bernard commented
Hi, can I have a complete example of a failing command please, with input text and options?
Mathieu Bernard commented
Ok I understood the bug, it occurs when trying to restore punctuation on an empty text. I'll publish a fix soon. Thanks for reporting.
Mathieu Bernard commented
Fixed in ee591ed.
Michael Conrad commented
Don't know if this is related or not, but:
000004280: Hélas! . ni l'un ni l'autre ne ressemblait au sien.
Traceback (most recent call last):
File "/home/muksihs/git/Cherokee-TTS/data/comvoi_ipa/generateTrainingData.py", line 59, in <module>
use_sampa=False)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/phonemize.py", line 172, in phonemize
text, separator=separator, strip=strip, njobs=njobs)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/backend/base.py", line 126, in phonemize
text = self._punctuator.restore(text, punctuation_marks)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/punctuation.py", line 146, in restore
return cls._restore_aux(str2list(text), marks, 0)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/punctuation.py", line 166, in _restore_aux
[text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/punctuation.py", line 166, in _restore_aux
[text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
IndexError: list index out of range
pip show phonemizer
Name: phonemizer
Version: 2.1
Summary: Simple text to phones converter for multiple languages
Home-page: https://github.com/bootphon/phonemizer
Author: Mathieu Bernard
Author-email: mathieu.a.bernard@inria.fr
License: GPL3
Location: /home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages
Requires: segments, attrs, joblib
Required-by:
Mathieu Bernard commented
Hi, indeed you should upgrade your phonemizer version:
>>> from phonemizer import phonemize
>>> utt = "Hélas! . ni l'un ni l'autre ne ressemblait au sien."
>>> phonemize(utt, backend='espeak', language='fr-fr', preserve_punctuation=True)
'elas ! . ni lœ̃ ni lotʁ nə ʁəsɑ̃blɛt o sjɛ̃ .'
I got the version
$ phonemize --version
phonemizer-2.2.2
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.1.3