ishalyminov / multitask_disfluency_detection

Code for the paper "Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems" (Igor Shalyminov, Arash Eshghi, and Oliver Lemon)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multitask disfluency detection

Code for the paper "Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems" (Igor Shalyminov, Arash Eshghi, and Oliver Lemon) [SemDial 2018 paper] [Slides]

Model architecture

Getting started

  1. Set up the environment (below are steps for Conda):
$ cd code-directory
$ git submodule update --init
$ conda create -n multitask_disfluency python=2.7
$ conda activate multitask_disfluency
$ pip install -r requirements.txt
  1. Preprocess the Switchboard dataset for training:
$ python make_deep_disfluency_dataset.py swbd disfluency
  1. Train the model:
$ python train.py swbd model

bAbI+ disfluency study data generation

  1. Get the bAbI tools and install requirements
  2. Download bAbI dialog tasks into the babi_tools folder
  3. Run sh make_generalization_study_datasets.sh <RESULT_FOLDER>
  4. Run sh tag_dataset.sh <RESULT_FOLDER> <config_file_name> for every config in 2018_generalization_study_configs
  5. The resulting datasets are <RESULT_FOLDER>/<BABI_DATASET_NAME>/*.tagged.json

About

Code for the paper "Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems" (Igor Shalyminov, Arash Eshghi, and Oliver Lemon)


Languages

Language:Python 61.6%Language:Jupyter Notebook 37.0%Language:Shell 1.4%