Wrong parsing

Question

Wrong parsing

qo4on opened this issue 4 years ago · comments

g2p module can't parse sentences like:

g2p("HTTP")
['T', 'AE1', 'P', 'T', 'IY1']

g2p("RFC")
['R', 'EH1', 'F', 'S', 'IY1']

and can't split words like taxidriver -> taxi driver.

Is it possible to tune predict function of g2p to get a correct grapheme to phoneme conversion for abbreviations and composite words?

qo4on · Answer 1 · Mon Apr 13 2020 00:05:15 GMT+0800 (China Standard Time)

The only workaround I found is to add this in __call__ function:

self.exceptions = {'http', 'https', 'rfc'}

elif word in self.exceptions:
  pron = []
  w_len = len(word)
  for i in range(w_len):
    pron.extend(self.cmu[word[i]][0])
    if i != w_len - 1: pron.extend([" "])

g2p("http, rfc")
['EY1', 'CH', ' ', 'T', 'IY1', ' ', 'T', 'IY1', ' ', 'P', 'IY1', ' ', ',', ' ', 'AA1', 'R', ' ', 'EH1', 'F', ' ', 'S', 'IY1']