HanleiZhang / THUMT

An open-source neural machine translation toolkit developed by Tsinghua Natural Language Processing Group

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

THUMT: An Open Source Toolkit for Neural Machine Translation

Contents

Introduction

Machine translation is a natural language processing task that aims to translate natural languages using computers automatically. Recent several years have witnessed the rapid development of end-to-end neural machine translation, which has become the new mainstream method in practical MT systems.

THUMT is an open-source toolkit for neural machine translation developed by the Natural Language Processing Group at Tsinghua University. The website of THUMT is: http://thumt.thunlp.org/.

Online Demo

The online demo of THUMT is available at http://translate.thumt.cn/. The languages involved include Ancient Chinese, Arabic, Chinese, English, French, German, Indonesian, Japanese, Portugese, Russian, and Spanish.

Implementations

THUMT has currently two main implementations:

The following table summarizes the features of two implementations:

Implementation Model Criterion Optimizer LRP
Theano RNNsearch MLE, MRT, SST SGD, AdaDelta, Adam RNNsearch
TensorFlow Seq2Seq, RNNsearch, Transformer MLE Adam RNNsearch, Transformer

We recommend using THUMT-TensorFlow, which delivers better translation performance than THUMT-Theano. We will keep adding new features to THUMT-TensorFlow.

It is also possible to exploit layer-wise relevance propagation to visualize the relevance between source and target words with THUMT:

Visualization with LRP

Notable Features

License

The source code is dual licensed. Open source licensing is under the BSD-3-Clause, which allows free use for research purposes. For commercial licensing, please email thumt17@gmail.com.

Citation

Please cite the following paper:

Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, Yang Liu. 2017. THUMT: An Open Source Toolkit for Neural Machine Translation. arXiv:1706.06415.

Development Team

Project leaders: Maosong Sun, Yang Liu, Huanbo Luan

Project members: Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng

Contributors

Contact

If you have questions, suggestions and bug reports, please email thumt17@gmail.com.

Derivative Repositories

  • Document-Transformer (Improving the Transformer Translation Model with Document-Level Context)
  • PR4NMT (Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization)

About

An open-source neural machine translation toolkit developed by Tsinghua Natural Language Processing Group

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Python 100.0%