gorkemozkaya / nmt-en-tr

Neural Machine Translation Between English and Turkish with pre-trained model releases

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Neural Machine Translation Between English and Turkish

Gorkem Ozkaya

This repo contains a pair of NMT models between English and Turkish (in both directions). One can download the pre-trained models from the releases section and use the below Jupyter/Colab notebooks in the root directory as a reference for downloading and running the pre-trained models.

  • TF1 version: Open In Colab
  • TF2 version: Open In Colab

Interactive demos for the TF2 version of the models are available on HuggingFace 🤗 Spaces:

The models are trained on Google Cloud TPU's using the tensor2tensor library for the TF1 version, and with TensorFlow's official models library for the TF2 version. As the neural network architechture, the Transformer architecture is used with the transformer_tpu hyperparameter configuration.

Acknowledgements

  • TFRC Tensorflow Research Cloud program for cloud TPU hours
  • Opus parallel corpus for making the Turkish/English parallel corpus available
  • Open Subtitles As being the original source of the movie subtitles parallel corpus. Also see
P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. 
In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)
  • SETIMES As the original source of the news articles corpus.

About

Neural Machine Translation Between English and Turkish with pre-trained model releases


Languages

Language:Jupyter Notebook 65.4%Language:Python 25.6%Language:Shell 9.0%