多音字的各style拼音序列的元素不对应，或者不知如何由拼音提取声母/韵母

Question

多音字的各style拼音序列的元素不对应，或者不知如何由拼音提取声母/韵母

Roseleaves opened this issue 2 years ago · comments

运行环境

操作系统（Linux/macOS/Windows）：Windows
Python 版本：3.9.7
pypinyin 版本：0.46.0

问题描述

我希望构建一个韵母表或者音节表，来显示每个韵母或者音节中包含了文章的多少个字。多音字一律重复计数。
以音节表为例。它的行是声母，列是韵母。然后每格表示这个音节。
但是多音字模式并不能通过遍历由
pinyin(ch,style=Style.INITIALS,heteronym=True), pinyin(ch,style=Style.FINALS,heteronym=True) 所组成的 tuple 来确定每一个音节的行列。因为多音统计时被合并过，所以它不能一一对应。

另一个解决方案是直接记录 f = pinyin(ch,style=Style.TONE3,heteronym=True)，然后试图套用 finals(f[0][0]) 来得到这个音节的韵母。但是这个音节转韵母的函数并不存在。

问题复现步骤

>>> from pypinyin import *
>>> pinyin('噷',style=Style.TONE,heteronym=True)
[['hm', 'xīn', 'hēn']]
>>> pinyin('噷',style=Style.FINALS,heteronym=True)
[['in', 'en']]
>>> pinyin('噷',style=Style.INITIALS,heteronym=True)
[['h', 'x']]

我希望看到的输出

>>> from pypinyin import *
>>> pinyin('噷',style=Style.TONE,heteronym=True)
[['hm', 'xīn', 'hēn']]
>>> pinyin('噷',style=Style.FINALS,heteronym=True)
[['', 'in', 'en']]
>>> pinyin('噷',style=Style.INITIALS,heteronym=True)
[['h', 'x', 'h']]

我希望看到的另一种输出

>>> from pypinyin import *
>>> pinyin('噷',style=Style.TONE,heteronym=True)
[['hm', 'xīn', 'hēn']]
>>> pinyin('噷',style=Style.FINALS,heteronym=True)
[['m', 'in', 'en']]
>>> pinyin('噷',style=Style.INITIALS,heteronym=True)
[['h', 'x', 'h']]

Huang Huang · Answer 1 · Wed Apr 20 2022 21:15:15 GMT+0800 (China Standard Time)

你的这个需求，可以考虑用 #225 (comment) 这里的方法对拼音做二次处理去获取相应的声母和韵母。

Roseleaves · Answer 2 · Mon Apr 25 2022 19:05:26 GMT+0800 (China Standard Time)

哦哦哦，看到转换函数了！okk
此外，对这些现代汉语中已经统读的字或者已经消亡的读音，不知道是应该怎么处置。
它似乎对我的输入法造成了不小的麻烦。

>>> pinyin('之', heteronym=True)
[['zhī', 'zhū', 'zhì']]
>>> pinyin('怕', heteronym=True)
[['pà', 'bó']]
>>> pinyin('跑', heteronym=True)
[['pǎo', 'páo', 'bó']]
>>> pinyin('重', heteronym=True)
[['zhòng', 'chóng', 'tóng']]

Huang Huang · Answer 3 · Mon Apr 25 2022 21:52:50 GMT+0800 (China Standard Time)

可以看看这个 issue 里提到的方法: #198