DialDoc21: Shared Task on Doc2Dial Dataset

DialDoc21 Shared Task at ACL 2021 includes two subtasks for building goal-oriented document-grounded dialogue systems. The first subtask is to predict the grounding in the given document for next agent response; the second subtask is to generate agent response in natural language given the contexts.

Data

This shared task is based on Doc2Dial v1.0.1 in folder data/doc2dial. For more information about the dataset, please refer to README, paper and Doc2Dial Project Page.

Note: you can choose to utilize other public datasets in addition to Doc2Dial data for training. See example here.

Shared Task

Subtask 1

The task is to predict the knowledge grounding in form of document span for the next agent response given dialogue history and the associated documents.

Input: the associated document and dialogue history.
Output: the grounding text.
Evaluation: exact match and F1 scores. Please refer to script for more details.

Subtask 2

The task is to generate the next agent response in natural language given dialogue-based and document-based contexts.

Input: the associated document and dialogue history.
Output: dialog utterance.
Evaluation: sacrebleu and human evaluation. Please refer to script for more details. Stay tuned for more details about human evaluations.

Baselines

Environment Setup

Create a virtual environment

conda create -n ENV_NAME python=3.7
conda activate ENV_NAME

Install PyTorch

conda install pytorch cudatoolkit=10.2 -c pytorch

Install Huggingface Transformers, Datasets and a few more dependencies

pip install -r requirements.txt

Install NVIDIA/apex

conda install -c conda-forge nvidia-apex

Load Dataset

You can use Huggingface Dataset to load Doc2Dial datasets.

For loading v1.0.1 for subtask 1, you need to point to scripts/datasets/doc2dial/doc2dial.py.

from datasets import load_dataset
datasets = load_dataset("scripts/datasets/doc2dial/", "doc2dial_rc", cache_dir="YOUR_LOCAL_CACHE")

The script includes how to obtain ID and expected output format given an agent turn for prediction or generation task.

Run Baseline for Subtask 1

Run HuggingFace QA on Doc2Dial

For fine-tuning Bert on Doc2Dial,

cd sharedtask-dialdoc2021/scripts/subtask1
./run_qa.sh

Results on validation set:

# bert-base-uncased
f1 = 56.29 
exact_match = 39.73
# bert-large-uncased-whole-word-masking
f1 = 62.98
exact_match = 50.50

Evaluating your model output

Output format and sample file

Please see the format in sample file.

Evaluation script

Please refer to script for evaluating your model predictions.

python sharedtask_utils.py --task subtask1 --prediction_json sample_files/sample_prediction_subtask1.json

Run Baseline for Subtask 2

Run HuggingFace Seq2Seq on Doc2Dial

For generating input files,

We first create source and target files. Please see run script with required parameters along with other default values.
```
cd scripts/subtask2
python seq2seq_utils.py --split validation --output_dir seq2seq_files
```
For fine-tuning bart on Doc2Dial,
```
cd scripts/subtask2
./run_seq2seq.sh
```
Results on validation set:
```
# bart-large-cnn
val_bleu = 17.72
```

Evaluating your model output

Output format and sample file

Please see the format in sample file.

Evaluation script

Please refer to script for evaluating your model predictions.

python sharedtask_utils.py --task subtask2 --prediction_json sample_files/sample_prediction_subtask2.json

About Participation

For more up-to-date information about participating DialDoc21 Shared Task, please check our workshop page.

songfeng / sharedtask-dialdoc2021