deep-learning neural-network lstm multitask-learning language-model natural-language-processing nlp tensorflow dialog dialogue-systems disfluency sequence-labeling

Multitask disfluency detection

Code for the paper "Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems" (Igor Shalyminov, Arash Eshghi, and Oliver Lemon) [SemDial 2018 paper] [Slides]

Model architecture

Getting started

Set up the environment (below are steps for Conda):

$ cd code-directory
$ git submodule update --init
$ conda create -n multitask_disfluency python=2.7
$ conda activate multitask_disfluency
$ pip install -r requirements.txt

Preprocess the Switchboard dataset for training:

$ python make_deep_disfluency_dataset.py swbd disfluency

Train the model:

$ python train.py swbd model

bAbI+ disfluency study data generation

Get the bAbI tools and install requirements
Download bAbI dialog tasks into the babi_tools folder
Run sh make_generalization_study_datasets.sh <RESULT_FOLDER>
Run sh tag_dataset.sh <RESULT_FOLDER> <config_file_name> for every config in 2018_generalization_study_configs
The resulting datasets are <RESULT_FOLDER>/<BABI_DATASET_NAME>/*.tagged.json

About

Code for the paper "Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems" (Igor Shalyminov, Arash Eshghi, and Oliver Lemon)

deep-learning neural-network lstm multitask-learning language-model natural-language-processing nlp tensorflow dialog dialogue-systems disfluency sequence-labeling

Languages

Language:Python 61.6%Language:Jupyter Notebook 37.0%Language:Shell 1.4%