nvog / lost-in-interpretation

Code for "Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation" at NAACL 2019

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

lost-in-interpretation

Code for "Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation" at NAACL 2019

Untranslated Term Annotations

Untranslated term annotations for the NAIST Simultaneous Translation Corpus will be provided upon request (via email) after confirmation that you have access to the corpus, available at https://ahcweb01.naist.jp/resource/stc/.

Other Requirements

The feature code also requires the EIJIRO English-Japanese bilingual dictionary, which you'll need to purchase here: http://www.eijiro.jp/get-144.htm. Alternatively, you could construct your own bilingual dictionary using NLP tools.

Ngram word frequencies are obtained using the Google 1T Ngrams corpus: https://catalog.ldc.upenn.edu/LDC2006T13.

About

Code for "Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation" at NAACL 2019


Languages

Language:Python 100.0%