THUNLP-MT / MT-Reading-List

A machine translation reading list maintained by Tsinghua Natural Language Processing Group

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Any papers relating to transliteration?

echan00 opened this issue · comments

Particularly, I'm looking for papers relating to incorporating domain glossaries and improving accuracy/consistency of number translations in neural machine translation


According to your description, I think you might want to have a look at the subsection Word/Phrase Constraints.

For example, the third paper in that subsection (Incorporating Discrete Translation Lexicons into Neural Machine Translation) shows an example that in English-Japanese translation, the word Tunisia is translated into ノルウェー(noruue-) by mistake, while the correct translation should be チュニジア(chunisia). I think this can be viewed as the transliteration problem that you mentioned.

In my opinion, we can alleviate such problem by incorporating domain glossaries (say some transliteration dictionaries), and this goes to Prior Knowledge Integration => Word/Phrase Constraints. And also, if you are also looking for some easily practical techniques, the name entity translation technique might be a good choice, which is used by the champion of WMT'17 ZH-EN task (see WMT 2017). The technique is described in their system report, which might also be a good reference.

Hope this would help you!

Thanks @minicheshire, will look into these!

I'm also looking for literature to rectify incorrect translation of numbers..

127萬美元 | US $ 1,700,000
EUR490 | 450欧元
$523M | 523百万澳元
十点五十五分 | 10:15
08:25am | 上午八时正
nine hundred and fifty-five thousand | 九十五万
$9.71 Billion | 9.71亿元


I think the problems that you mentioned can still be alleviated by the techniques proposed in Word/Phrase Constraints; You may also refer to CopyNet, which can preserve some of the text on the source side when translating.