anoopkunchukuttan / moses_job_scripts

A simple experiment management system for Moses

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

README for Moses Job Scripts

Version 1.0

moses_job_scripts is a toolkit to help in basic tasks for training and developing SMT systems. It contains:

  • Testbench to automate Moses-based SMT system training, tuning and evaluation (phrase based and factored)
  • Batch training, tuning and evaluation of Moses models and extraction of results

Preparing your SMT environment

  • The first task is to install the following software:
    • Moses 0.91+
    • Giza++
    • SRILM
    • IRSTLM [optional]
    • METOER or meteor_indic (An adaptation of the METEOR tool for Indian languages) [optional]
    • TER [optional]
  • moses_job_scripts requires the Moses and related softwares to be installed as per the directory layout mentioned in section 'Directory layout for SMT software installation'.
  • Each of these have their prerequisites. Please check the README for each of these tools. You can use the following as a guide for installing all these softwares:

Yet Another Moses Installation Guide

NOTE: As you can see, installation of these is pretty complicated and a script to automate the installation of the entire system would be desirable. 
If you end up writing one, sharing that would be appreciated. Please mail me at anoopk@cse.iitb.ac.in. 
  • Edit the file and set the variables SMT_SYSTEM_DIR: The path to the smt directory SMT_METRICS_DIR: The path to the smt_eval_metrics directory

Directory layout for SMT software installation

smt
|--- giza-pp (compiled giza++ source code)
|--- bin (contains giza++ binaries - giza-pp/mkcls-v2/mkcls giza-pp/GIZA++-v2/GIZA++ giza-pp/GIZA++-v2/snt2cooc.out )
|--- moses_job_scripts (contains scripts to run the entire SMT workflow. It is the directory containing this README )
|--- mosesdecoder (moses decoder)
|--- srilm (srilm)
|--- irstlm (irstlm) [optional]

smt_eval_metrics
|--- meteor (Meteor/meteor_indic)
|--- ter (TER)

Using the testbench to train and evaluate a translation system

Once the SMT environment is ready, it is pretty easy to use the workbench for running an experiment to train and evaluate a translation system.

  1. Create the parallel corpus files to be used for the experiment. The files must be in a single directory and must be named as follows:

    train.<src_lang> e.g. train.en train.<tgt_lang> test.<src_lang> test.<tgt_lang> tun.<src_lang> tun.<tgt_lang>

  2. Create a configuration file which mentions the experimental settings. A sample configuration file can be found here:
    sample_data/sample_moses_params.conf

  3. Run the following command: moses_run.sh <config_file> [notune|notrain]

    notune: if provided, no tuning and evaluation on tuned model is done
    notrain: if provided, no training and evaluatin on trained model is done is done. Only tuning is done, and it is assumed that the workspace contains all model files generated by an earlier training run.

    The intermediate and final output are generated in the $WORKSPACE directory.

    The workspace will contain the following directories:

  • log: contains various log files

  • cleaned: cleaned up corpus

  • lm: The target side language model

  • moses_data: The intermediate files and model output after training

  • tuning: The intermediate files and output generated after tuning

  • evaluation: evaluation results

  • run_params.conf: A copy of the config file for the experiment

    The important files for observing the output are:

  • evaluation/test_no_tun.<tgt_lang> : output from untuned model

  • evaluation/test.<tgt_lang> : output from tuned model

  • evaluation/results_wo_tuning/summary.txt : evaluation results without tuning

  • evaluation/results_with_tuning/summary.txt: evaluation results with tuned model

  • moses_data/model/moses.ini : untuned model file

  • tuning/moses.ini : tuned model file

Batch Operations

To be documented

Author

Anoop Kunchukuttan ( anoop.kunchukuttan@gmail.com )

Revision Log

1.0 : Stable release

LICENSE

Copyright Anoop Kunchukuttan 2013 - present

Moses Job Scripts is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Moses Job Scripts is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Moses Job Scripts. If not, see http://www.gnu.org/licenses/.

About

A simple experiment management system for Moses

License:GNU General Public License v3.0


Languages

Language:JavaScript 96.8%Language:Shell 2.7%Language:Python 0.5%