API接口如何处理多音字比如”还（hai2）不还（huan2）钱“

Question

API接口如何处理多音字比如”还（hai2）不还（huan2）钱“

Oceannew opened this issue 3 months ago · comments

API接口如何处理多音字比如”还（hai2）不还（huan2）钱“ 。两个还出来的是一样的发音。

{
"input": "还不还钱",
"voice": "",
"prompt": "",
"language": "zh_us",
"model": "emoti-voice",
"response_format": "mp3",
"speed": 1.0
}

导出的MP3：https://github.com/netease-youdao/EmotiVoice/assets/37178037/f9a19d84-9b63-4adf-9c62-e8663c8cb0a7

Yanqing Sun · Answer 1 · Tue Mar 26 2024 13:39:13 GMT+0800 (China Standard Time)

It is a good question! Perhaps you could follow these steps:

Generate phonetic transcriptions from the text '还不还钱' by using python frontend.py data/text. This will yield phonetic results like '<sos/eos> h ai2 sp1 b u4 sp1 h ai2 sp1 q ian2 <sos/eos>'.
Adjust the phonetic results as needed, for example: '<sos/eos> h ai2 sp1 b u4 sp1 h huan2 sp1 q ian2 <sos/eos>'.
Perform TTS inference using python inference_am_vocoder_joint.py --logdir prompt_tts_open_source_joint --config_folder config/joint --checkpoint g_00140000 --test_file data/text_tts.

I have provided an example of my experiment for your reference.

issues_143.tar.gz

Oceannew · Answer 2 · Tue Mar 26 2024 14:28:38 GMT+0800 (China Standard Time)

这是个好问题！也许您可以按照以下步骤操作：

使用从文本 '还不还钱' 生成音标。这将产生语音结果，例如 '<sos/eos> h ai2 sp1 b u4 sp1 h ai2 sp1 q ian2 <sos/eos>'。python frontend.py data/text

根据需要调整拼音结果，例如：“<sos/eos> h ai2 sp1 b u4 sp1 h huan2 sp1 q ian2 <sos/eos>'。

使用执行 TTS 推理。python inference_am_vocoder_joint.py --logdir prompt_tts_open_source_joint --config_folder config/joint --checkpoint g_00140000 --test_file data/text_tts

我提供了一个我的实验示例供您参考。

issues_143.tar.gz

那我该如何去判断是h ai2还是h uan2呢，在input参数的文本上添加标记么。比如：“input”： “还(h ai2)不还(h uan2)钱”。然后去修改frontend.py中的方法去判断吗？