Wrong parsing
qo4on opened this issue · comments
g2p module can't parse sentences like:
g2p("HTTP")
['T', 'AE1', 'P', 'T', 'IY1']
g2p("RFC")
['R', 'EH1', 'F', 'S', 'IY1']
and can't split words like taxidriver -> taxi driver
.
Is it possible to tune predict
function of g2p
to get a correct grapheme to phoneme conversion for abbreviations and composite words?
The only workaround I found is to add this in __call__
function:
self.exceptions = {'http', 'https', 'rfc'}
elif word in self.exceptions:
pron = []
w_len = len(word)
for i in range(w_len):
pron.extend(self.cmu[word[i]][0])
if i != w_len - 1: pron.extend([" "])
g2p("http, rfc")
['EY1', 'CH', ' ', 'T', 'IY1', ' ', 'T', 'IY1', ' ', 'P', 'IY1', ' ', ',', ' ', 'AA1', 'R', ' ', 'EH1', 'F', ' ', 'S', 'IY1']