Language Technology at the University of Helsinki's repositories
UkrainianLT
A collection of links to Ukrainian language tools
OPUS-MT-testsets
benchmarks for evaluating MT models
americasnlp2021-st
AmericasNLP 2021 shared task
murreviikko
Dialectologically annotated and normalized dataset of dialectal Finnish tweets
ndc-aligned
Word-aligned version of the Norwegian Dialect Corpus
OpusFilter-hub
A hub of OpusFilter configurations
americasnlp2023-st
AmericasNLP 2023 shared task (Helsinki fork)
building-nlp-apps-notebooks
Python notebook demos for the Building NLP Applications course
controlled_simplification_ru
A project on controlled Russian text simplification.
dial_align
Character alignment for normalized dialect corpora
murre
The amazing 🐕will normalize non-standard Finnish/Swedish and dialectalize standard Finnish!
OPUS-MT-bot
Translation Bot between Ukrainian and Czech.
OPUS-MT-devsets
development data for OPUS-MT
OPUS-MT-map
A map of available translation models
syntaxmaker
The NLG tool for Finnish
wikitextprocessor
Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. For data extraction, bulk syntax checking, error detection, and offline formatting.
wiktextract
Wiktionary dump file parser and multilingual data extractor