Gldkslfmsd / man-eval-decs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Description
================

data/ - raw translation, source and references data in format for MT-ComparEval

anotate-baseline-1Mvs250k -- dir with annotation task data
├── anotation.anot  -- plain quickjudge .anot file
├── anotation.coresp 
├── anotation-dm.anot  -- annotations by DM
├── anotation-dm.coresp -> anotation.coresp
├── anotation-ob.anot-raw -- prepared for OB
├── anotation-ob.coresp -> anotation.coresp
├── baseline-1M
├── baseline-250k
├── ref
└── source

...etc

Annotation Tasks
================

anotate-baseline-1Mvs250k 
	- dev 
	- baseline at 250k vs 1M steps
	- dm: cca 67% done

anotate-baseline_vs_taskid 
	- dev
	- 1M steps
	- plain baseline vs with task identification token

anotate-ner5enrich_vs_baseline/
	- test
	- 1M
	- DM 100% done
	
anotate-ner58gold_vs_baseline/
	- test
	- 1M
	- DM 100% done

TODO

Annotation Manual
=================

- we compare pairs of runs
- baseline vs:
	- compare overall quality of translation

	mark |	meaning
	===============
	sb   | substantially better -- mark the one, the other has no mark
	b    | better
	eg   | equally good -- mark both translations, if they differ
	eb   | equally bad -- mark both translations, if they differ
	?    | ?

- ner vs:
	- compare overall quality of translation

	mark |	meaning
	===============
	sb _   | substantially better -- mark the one, the other has no mark
	b _    | better
	eg eg  | equally good -- mark both translations, if they differ
	eb eb  | equally bad -- mark both translations, if they differ
	?      | ?

	- ...and overall translation of NE
		- the same marks, comma delimited, e.g. b,eb = overall translation better, NE equally bad
	

- other comments/marks optional, can be dropped later
- use quickjudge https://github.com/ufal/quickjudge and bash commands to sum the results

About


Languages

Language:Smalltalk 99.1%Language:Shell 0.4%Language:Python 0.3%Language:Batchfile 0.2%