Vilém Zouhar's repositories
tokenization-scorer
Simple-to-use scoring function for arbitrarily tokenized texts.
ryanize-bib
Highlight errors in a bib file: missing URLs, capitalization protection, etc
sentence-embd-fusion
Sentence embeddings as artefacts fused to language models
SlowAlignDisplayer
Create "pretty" graphs for aligned sentences
annotation-logger
Micro server for collection annotation data
bertalign
Multilingual sentence alignment using sentence embeddings
bio-mqm-dataset
Dataset and codebase for "Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains"
euler-blame
Who is taking up space on Euler?
metaphor-preservation
Evaluating the preservation of metaphors in machine translation and paraphrasing
prism
MT Evaluation in Many Languages via Zero-Shot Paraphrasing
stolen-subwords
Zero-data blackbox machine translation model distillation / stealing
vosc-single
Verification of Scientific Claims - Single Source