jojonki / MultiWOZ-Parser

A parser of the Multi-Domain Wizard-of-Oz dataset (MultiWOZ)

Home Page:http://dialogue.mi.eng.cam.ac.uk/index.php/corpus/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MultiWOZ-Parser (Unofficial)

A parser of the Multi-Domain Wizard-of-Oz dataset (MultiWOZ). The dataset consists of 2,730 single-domain dialogues that include booking if the domain allows for that and 7,375 multi-domain dialogues consisting of at least 2 up to 5 domains.

MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling Budzianowski, Pawe{\l} and Wen, Tsung-Hsien and Tseng, Bo-Hsiang and Casanueva, I{~n}igo and Ultes Stefan and Ramadan Osman and Ga{\v{s}}i'c, Milica. EMNLP 2018. https://arxiv.org/abs/1810.00278

Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing. Osman Ramadan, Paweł Budzianowski, Milica Gašić. ACL 2018. https://arxiv.org/abs/1807.06517

Dataset

You can download the dataset here.

Parsers

There are two types of the parser; iptyhon and python. Basically, they are the same and you can see some data processing flow and some sample data.

How to use the parser?

python parse_example.py --data_dir ./MultiWOZ/

Or, just use Jupyter Notebook for Parser.ipynb.

Domain Annotator (Unofficial) 🚧

I automatically annotated domains for the user turns since the aurthors do not provide domain labels. Also, a domain classification is not their goal.

  • Annotation rules
    • If a DST is updated by an user, the updated domain is used.
    • All the domains are noted in the goal property in the dialog.

About

A parser of the Multi-Domain Wizard-of-Oz dataset (MultiWOZ)

http://dialogue.mi.eng.cam.ac.uk/index.php/corpus/

License:Apache License 2.0


Languages

Language:Jupyter Notebook 82.2%Language:Python 17.8%