Language Technology at the University of Helsinki's repositories
OPUS-MT-train
Training open neural machine translation models
OpusFilter
OpusFilter - Parallel corpus processing toolkit
OPUS-MT-testsets
benchmarks for evaluating MT models
neural-search-tutorials
Additional Notebooks for the Building NLP Applications course
opus-fast-mosestokenizer
c++ mosestokenizer (OPUS fork)
uncertainty-aware-nli
Uncertainty-aware fine-tuning of transformers with NLI data.
dialect-topic-model
Scripts and metadata for the paper "Corpus-based dialectometry with topic models"
External-MT-leaderboard
Leaderboards for external MT models
OPUS-MT-leaderboard-recipes
Makefile recipes shared between all leaderboard repos
OpusDistillery
Training pipelines for Firefox Translations neural machine translation models (adapted for OPUS-MT and integrating GreenNLP metrics)
Contributed-MT-leaderboard
Leaderboard of contributed MT results
eflomal
Efficient Low-Memory Aligner
lowres-spain-st
This is the repository that contains all scripts and data from the Helsinki-NLP participation to the WMT24 Shared task: Translation into Low-Resource Languages of Spain
swa_gaussian
Code repo for "A Simple Baseline for Bayesian Uncertainty in Deep Learning" (Helsinki-NLP fork)