mbartoli / dl4mt-cdec

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Character-Level Neural Machine Translation

This is an implementation of the models described in the paper "A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation". http://arxiv.org/abs/1603.06147

Dependencies:

The majority of the script files are written in pure Theano.
In the preprocessing pipeline, there are the following dependencies.
Python Libraries: NLTK
MOSES: https://github.com/moses-smt/mosesdecoder
Subword-NMT (http://arxiv.org/abs/1508.07909): https://github.com/rsennrich/subword-nmt

This code is based on the dl4mt library.
link: https://github.com/nyu-dl/dl4mt-tutorial

Be sure to include the path to this library in your PYTHONPATH.

We recommend you to use the latest version of Theano.
If you want exact reproduction however, please use the following version of Theano.
commit hash: fdfbab37146ee475b3fd17d8d104fb09bf3a8d5c

Preparing Text Corpora:

The original text corpora could be downloaded from http://www.statmt.org/wmt15/translation-task.html Updating

About


Languages

Language:Python 87.3%Language:Perl 7.8%Language:Emacs Lisp 3.3%Language:Shell 0.4%Language:Smalltalk 0.4%Language:Ruby 0.3%Language:NewLisp 0.3%Language:JavaScript 0.2%Language:Slash 0.1%Language:SystemVerilog 0.0%