Error when phonemizing the word "wherever" in python3.8 environment
dmitrii-obukhov opened this issue · comments
Describe the bug
When trying to phonemize sentences with the word wherever, the result is incorrect and not determined. Sometimes it crashes with a segmentation fault. The error occurs when using python3.8. When using python3.7 the error did not occur.
Phonemizer version
phonemizer-3.0.1
available backends: espeak-ng-1.49.2, segments-2.2.1
uninstalled backends: espeak-mbrola, festival
System
Operating System: Amazon Linux 2
Kernel: Linux 4.14.268-205.500.amzn2.x86_64
Architecture: x86-64
Python 3.8.13
To reproduce
# setup environment
# (I assume that espeak is already installed, version 1.49.2)
python3 -m venv venv
. venv/bin/activate
pip install phonemizer==3.0.1
# reproduce an error
echo "wherever" | phonemize
echo "wherever you are" | phonemize
Expected behavior
Output should be:
echo "wherever" | phonemize
wɛɹɛvɚ
echo "wherever you are" | phonemize
wɛɹɛvɚ juː ɑːɹ
Actual behavior
Actual outputs:
echo "wherever" | phonemize
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɛvɚɹ ʌ ziəɹoʊ sɪks
echo "wherever" | phonemize
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɛvɚɹ ʌ
echo "wherever" | phonemize
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɛvɚɹ ʌ naɪn
echo "wherever your are" | phonemize
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɛvɚ j m m m m m m m m jʊɹ ɑːɹ
echo "wherever your are" | phonemize
[WARNING] 1 utterances containing language switches on lines 1
[WARNING] extra phones may appear in the "en-us" phoneset
[WARNING] language switch flags have been kept (applying "keep-flags" policy)
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɾɛvõ ʌ(gn) æ ziəɾoʊ jə ə
echo "wherever your are" | phonemize
*** stack smashing detected ***: <unknown> terminated
Aborted
Additional context
When using python3.7 the error is not reproduced.
When using phonemizer 3.2.1, the error also happens, but less often.
No errors were found in the phonemization of other words.
It seems to be an espeak issue that is being discussed here.
Thanks for reporting, indeed this is related to espeak, not phonemizer.