bootphon / phonemizer

Simple text to phones converter for multiple languages

Home Page:https://bootphon.github.io/phonemizer/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Disparity between backends with punctuation

agkphysics opened this issue · comments

Describe the bug
When using the default preserve_punctuation=False, the Festival backend ignores text that only contains punctuation, whereas the Espeak backend returns the empty string.

Phonemizer version

phonemizer-3.2.1
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.2.1

System
Ubuntu 20.04.4
Linux kernel 5.15.0
Python 3.8.10

To reproduce

from phonemizer import phonemize

print(phonemize([".", "."], language="en-us", backend="festival"))
print(phonemize([".", "."], language="en-us", backend="espeak"))
print(phonemize([".", "."], language="mb-us1", backend="espeak-mbrola"))

Yields output

[]
['', '']
['', '']

Expected behavior
Should output:

['', '']
['', '']
['', '']

Hi, actually with preserve_punctuation=True another bug occurs:

from phonemizer import phonemize 

print(phonemize([".", "."], language="en-us", backend="festival", preserve_punctuation=True)) 
print(phonemize([".", "."], language="en-us", backend="espeak", preserve_punctuation=True)) 
print(phonemize([".", "."], language="mb-us1", backend="espeak-mbrola", preserve_punctuation=True))   

Yields

['..']
['..']
['', '']

But should be (espeak-mbrola does not support punctuation)

['.', '.']
['.', '.']
['', '']