Kyubyong / g2p

g2p: English Grapheme To Phoneme Conversion

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong parsing

qo4on opened this issue · comments

commented

g2p module can't parse sentences like:

g2p("HTTP")
['T', 'AE1', 'P', 'T', 'IY1']

g2p("RFC")
['R', 'EH1', 'F', 'S', 'IY1']

and can't split words like taxidriver -> taxi driver.

Is it possible to tune predict function of g2p to get a correct grapheme to phoneme conversion for abbreviations and composite words?

commented

The only workaround I found is to add this in __call__ function:

self.exceptions = {'http', 'https', 'rfc'}

elif word in self.exceptions:
  pron = []
  w_len = len(word)
  for i in range(w_len):
    pron.extend(self.cmu[word[i]][0])
    if i != w_len - 1: pron.extend([" "])

g2p("http, rfc")
['EY1', 'CH', ' ', 'T', 'IY1', ' ', 'T', 'IY1', ' ', 'P', 'IY1', ' ', ',', ' ', 'AA1', 'R', ' ', 'EH1', 'F', ' ', 'S', 'IY1']