bootphon / phonemizer

Simple text to phones converter for multiple languages

Home Page:https://bootphon.github.io/phonemizer/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error when phonemizing the word "wherever" in python3.8 environment

dmitrii-obukhov opened this issue · comments

Describe the bug
When trying to phonemize sentences with the word wherever, the result is incorrect and not determined. Sometimes it crashes with a segmentation fault. The error occurs when using python3.8. When using python3.7 the error did not occur.

Phonemizer version

phonemizer-3.0.1
available backends: espeak-ng-1.49.2, segments-2.2.1
uninstalled backends: espeak-mbrola, festival

System
Operating System: Amazon Linux 2
Kernel: Linux 4.14.268-205.500.amzn2.x86_64
Architecture: x86-64
Python 3.8.13

To reproduce

# setup environment
# (I assume that espeak is already installed, version 1.49.2)
python3 -m venv venv
 . venv/bin/activate
pip install phonemizer==3.0.1

# reproduce an error
echo "wherever" | phonemize
echo "wherever you are" | phonemize

Expected behavior
Output should be:

echo "wherever" | phonemize
wɛɹɛvɚ 

echo "wherever you are" | phonemize
wɛɹɛvɚ juː ɑːɹ 

Actual behavior
Actual outputs:

echo "wherever" | phonemize
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɛvɚɹ ʌ ziəɹoʊ sɪks

echo "wherever" | phonemize
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɛvɚɹ ʌ 

echo "wherever" | phonemize
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɛvɚɹ ʌ naɪn

echo "wherever your are" | phonemize
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɛvɚ j m m m m m m m m jʊɹ ɑːɹ 

echo "wherever your are" | phonemize
[WARNING] 1 utterances containing language switches on lines 1
[WARNING] extra phones may appear in the "en-us" phoneset
[WARNING] language switch flags have been kept (applying "keep-flags" policy)
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɾɛvõ ʌ(gn) æ ziəɾoʊ jə ə 

echo "wherever your are" | phonemize
*** stack smashing detected ***: <unknown> terminated
Aborted

Additional context
When using python3.7 the error is not reproduced.
When using phonemizer 3.2.1, the error also happens, but less often.
No errors were found in the phonemization of other words.

It seems to be an espeak issue that is being discussed here.

Thanks for reporting, indeed this is related to espeak, not phonemizer.