Kyubyong / g2pC

g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The tone change rules are not implemented well

LLauryn opened this issue · comments

https://resources.allsetlearning.com/chinese/pronunciation/Tone_change_rules#Why_Tone_Changes_Are_Not_Written
The website above is the specific description about the rules.
And when I used the code below,
from g2pc import G2pC
g2p = G2pC()
print(g2p("卡尔普"))
the result was [('卡', 'nr', 'qia3', 'qia2', '/to block/to be stuck/to be wedged/customs station/a clip/a fastener/a checkpost/Taiwan pr. [ka3]/', '卡'), ('尔', 'nr', 'er3', 'er3', '/variant of 爾|尔[er3]/', '尒'), ('普', 'nr', 'pu3', 'pu3', '/general/popular/everywhere/universal/', '普')].
Actually, the correct conversion for "尔" should be 'er2' because the pronunciation of next word "普" is 'pu3'.
In addition, when the input was the sentence “老虎幼崽与宠物犬玩耍”, the result was [('老虎', 'n', 'lao3 hu3', 'lao2 hu3', '/tiger/CL:隻|只[zhi1]/', '老虎'), ('幼崽', 'n', 'you4 zai3', 'you4 zai2', '/young (of an animal)/', '幼崽'), ('与', 'p', 'yu3', 'yu3', '/and/to give/together with/', '與'), ('宠', 'n', 'chong3', 'chong3', '/to love/to pamper/to spoil/to favor/', '寵'), ('物', 'n', 'wu4', 'wu4', '/thing/object/matter/abbr. for physics 物理/', '物'), ('犬', 'n', 'quan3', 'quan3', '/dog/', '犬'), ('玩耍', 'v', 'wan2 shua3', 'wan2 shua3', '/to play (as children do)/to amuse oneself/', '玩耍')]
The conversion for "与" is wrong for the same reason as before.

In my opinion, Tone Changes for Multiple Third Tones may be always depend on the pronunciation of the next word.

Thanks! I found it tricky to apply the tone change rules properly. Let me have some time to dig into it.