vyraun / wmt-format-tools

Tools for formatting WMT hypothesis and test sets in XML

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WMT Format Tools

Tools for handling the xml-formatted test sets and hypothesis files used in the WMT news task

Installation

Requires python >= 3.6

pip install git+https://github.com/wmt-conference/wmt-format-tools.git

Preparing a WMT submission

  1. Download the xml file containing the source (e.g. newsdev2021.ha-en.source.xml)
  2. Extract text from the source
  wmt-unwrap -o newsdev2021.ha-en < newsdev2021.ha-en.source.xml
  1. Translate text to give (eg) newsdev2021.ha-en.hypo.en
  2. Wrap translation in xml, including team name
  wmt-wrap -s newsdev2021.ha-en.source.xml -t newsdev2021.ha-en.hypo.en -n UEDIN -l en > newsdev2021.ha-en.hypo.en.xml

API Usage

You can also use the tools via the API. See test/test-wrap-unwrap.py for a sample

About

Tools for formatting WMT hypothesis and test sets in XML

License:Apache License 2.0


Languages

Language:Python 100.0%