midobal / qet

Quality Estimation Tagger.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Quality Estimation Tagger

This software generates word-level/phrase-level QE tags for a given translated text and its post-edited version. Following the WMT shared task's notation, correct words/phrases are tagged as OK, and errors are tagged as BAD.

Dependencies

The phrase-level tagger makes use of mgiza.

Usage

QET.py -t translation_file -pe post-edited_file {-p source_file mgiza_path}

Where:

  • translation_file is the translated text.
  • post-edited_file is the post-edited version of the translated text.
  • -p switches to phrase-level generation.
  • source_file is the original text.
  • mgiza_path is the path to mgiza's bin folder (e.g., /opt/mgiza/mgizapp/bin/).

Examples

Word-level QE

Source: El profundo mar azul .

Translation: The deep sea blue .

Post-edited: The deep blue sea .

Tags: OK OK BAD OK OK

Phrase-level QE

Source: El profundo mar azul .

Translation: The deep sea blue .

Post-edited: The deep blue sea .

Tags: OK BAD BAD OK OK

About

Quality Estimation Tagger.

License:MIT License


Languages

Language:Python 85.5%Language:Shell 14.5%