Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties

This repository accompanies our paper Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties (Ekaterina Artemova, Verena Blaschke, & Barbara Plank, to be published at EACL 2024). It contains code for automatically applying morphosyntactic perturbation rules to German sentences in order to mimic grammatical structures found in colloquial varieties (details in the paper).

Usage conditions

We release this code for research purposes only, and expressly forbid usage for mockery or parody of any dialects or registers.

Dialect perturbations

We implemented 18 perturbations covering a wide range of dialect phenomena in German. The code is available in the dialect_perturbations.py file and the example usage is demontrated in the perturbation_test.ipynb notebook.

To test the perturbation, you'll require dictionaries and word lists from the resources folder, and the following packages:

SoMaJo for tokenization
SpaCy for POS tagging
Stanza for POS tagging and dependency parsing
DERBI for inflection -- at the moment, the 2022 version is needed for the code to run (integerated as a submodule here)
Pattern-de for verb conjugation

Installation

Clone the repo + submodule:

git clone --recursive git@github.com:mainlp/dialect-ToD-robustness.git

If you already cloned the repo without the recursive flag:

cd dialect-ToD-robustness
git submodule init
git submodule update

Install dependencies:

python -m pip install -r requirements.txt

or:

# pip install jupyter  # optional; only for sample notebook
pip install GitPython
pip install pandas
pip install somajo
pip install stanza
pip install spacy
pip install pattern

Install the SpaCy model:

python -m spacy download de_core_news_sm

Human evaluation

The table in the human_eval folder contains results of the human evaluation of perturbations on the Likert scale from 1 to 5. Each row corresponds to a pair of sentences where one sentence is a perturbation of the other. The columns are as follows:

sentence: the intact sentence
perturbed_sentence: the perturbed sentence
perturbation: the perturbation applied
ann_x: the score from the annotator x
ann_y: the score from the annotator y.

Results

The folder plots contains plots used in the main part of the paper and Appendices C and D.

The folder results contains resulting tables. Each table contains intent accuracy and slot F1 values for intact and perturbed test sets.

We use the following convention to name files. Each file is named according to the pattern '{train language}{dev language}.{test language}.{dataset}'. The suffix '1p' denotes cases where single perturbations are applied. In other cases, all perturbations are applied simultaneously by default.

Replication

To replicate the perturbation rules exactly as used in the paper (without potential later improvements), use this commit.

Cite us

@inproceedings{artemova-etal-2024-exploring,
  author    = {Artemova, Ekaterina and Blaschke, Verena and Plank, Barbara},
  title     = {Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties},
  booktitle = {Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics},
  year      = {2024},
  publisher = {Association for Computational Linguistics},
  note      = {To appear},
}

mainlp / dialect-ToD-robustness