MaiNLP's repositories
awesome-human-label-variation
A curated list of awesome datasets with human label variation (un-aggregated labels) in Natural Language Processing and Computer Vision, accompanying The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation (EMNLP 2022)
germanic-lrl-corpora
A survey of corpora for Germanic low-resource languages and dialects
How-to-distill-your-BERT
Code for the paper: How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives (ACL 2023)
noisydialect
Does manipulating tokenization aid cross-lingual transfer? A study on POS tagging for non-standardized languages
convert-qcri-4dialects
Converts the Four Arabic Dialects POS tagged Dataset (Darwish ea 2018) to UPOS
maibaam-code
Code for preprocessing data for UD annotations and for tagging/parsing experiments of MaiBaam
syntax-pre-training-for-RE
Silver Syntax Pre-training for Cross-Domain Relation Extraction (Findings of ACL 2023)
dialect-ToD-robustness
Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties (EACL 2024)
mainlp.github.io
MaiNLP research lab
common-voice
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
conllueditor
Fork of Orange-OpenSource/conllueditor
convert-restaure-occitan
Converts the Annotated Corpus for Occitan (10.5281/zenodo.1182948, Bras ea 2018) to UPOS by splitting contractions
Eevee
An Easy Annotation Tool for Natural Language Processing
el_esco
Codebase for Entity Linking in the Job Market Domain
label-variation-nli
Code used in More Labels or Cases? Assessing Label Variation in Natural Language Inference.
RC-analysis
Code for "What’s wrong with your model? A Quantitative Analysis of Relation Classification"
SkillSpan
SKILLSPAN: Competences as Spans for Skill Extraction from Job Postings
subspace-chronicles
How Linguistic Information Emerges, Shifts and Interacts during Language Model Training (EMNLP 2023)