Sentence Encoding Methods for Dialogue Act Classification

Overview

This repository contains all code, data and results/analysis for the paper Sentence Encoding for Dialogue Act Classification

We investigated the process of generating single-sentence representations for the purpose of Dialogue Act (DA) classification. Including several aspects of text pre-processing and input representation which are often overlooked or underreported within the literature, for example, the number of words to keep in the vocabulary or input sequences. We assessed each of these with respect to two DA labelled corpora, the Switchboard Dialogue Act corpus and Maptask, using a range of supervised models, which represent those most frequently applied to the task. Additionally, we compare supervised techniques with a that of transfer learning via pre-trained language models based on the transformer architecture, such as BERT, RoBERTa, and XLNET, which have thus far not been widely explored for the DA classification task. Our findings indicate that these text pre-processing considerations do have a statistically significant effect on classification accuracy. Notably, the optimal input sequence length, and vocabulary size, is much smaller than is typically used in DA Classification experiments, yielding no significant improvements beyond certain thresholds. We also show that in some cases the contextual sentence representations generated by language models do not reliably outperform supervised methods. Though BERT, and its derivative models, do represent a significant improvement over supervised approaches, and much of the previous work on DA classification.

Usage
Models
Datasets
Embeddings
Results

Usage

Setup

Clone this repository

git clone https://github.com/NathanDuran/Sentence-Encoding-for-DA-Classification.git

Install requirements

pip install -r requirements.txt

Note: All of the models within this repository are implemented in Tensorflow 1.15. However, several of the transformer-base models were implemented using the Huggingface Transformers library which requires Tensorflow 2.x.

(Optional) Set up Comet.ml

All of the run experiment scripts are configured to record results with Comet.ml. You simply need to add your comet workspace and project to the experiment run script. However, this is optional, as all training and testing data is also saved locally. By default Comet will not be used (but the Python package must still be installed).

Run Classifier

There are 3 run scripts included and you must run the correct script for a given model. For a full list see the models readme.

All of the supervised models are implemented in eager mode, so can be run with run_eager.py
Because TF 1.x Hub modules, and several of the other language model implementations, require graph mode, they must be run using run_graph.py
BERT and ALBERT can be run using run_bert.py or the transformers.ipynb
For RoBERTa, GPT2, DialoGPT and XLNET, it is suggested that you use the transformers.ipynb

To run an experiment with specific parameters there is a python dictionary declared at the beginning of each file. The 'experiment_params' contains all of the parameters neccessary for specifying the model, text pre-processing, embeddings, number of training epochs, batch size, etc.

Example Experiment Params:

{
'task_name': 'swda', # What dataset to use
'experiment_type': 'vocab_size', # What parameter is being tested (only needed for output csv)
'experiment_name': 'text_cnn_1',
'model_name': 'text_cnn', # What model to use, valid name from models.py
'training': True,
'testing': True,
'save_model': True, # Whether to save model weights
'load_model': True, # Whether to load best weights before testing, or laod weights file before training
'init_ckpt_file': None, # Optional checkpoint file to initialise model weights before training
'batch_size': 32,
'num_epochs': 10,
'evaluate_steps': 500, # Apply model to validation set per this many training batches
'early_stopping': False, # Whether to stop training after no improvement in metrics
'patience': 3, # Number of epochs to wait for early_stopping
'vocab_size': 10000, # Number of words to keep in the vocabulary during pre-processing
'max_seq_length': 128, # Number of tokens to keep in the input sequence
'to_tokens': True, # Tokenise input, or keep as string (for some language models)
'to_lower': True, # Lowercase all words
'use_punct': True, # Keep punctuation
'train_embeddings': True,
'embedding_dim': 50,
'embedding_type': 'glove', # What embedding type to use, valid name from embeddings_processor.py
'embedding_source': 'glove.6B.50d' # Name of the embeddings file
}

Run Optimiser

We also used comet optimisation to find suitable hyperparameters for each of the models. The parameters for each model are specified in model_optimisation_configs.json, and there are equivalent run scripts for each of the different types specified above (run_eager_optimiser.py, run_graph_optimiser.py, and run_bert_optimiser.py)

All of the optimisation results can be viewed in the comet project.

Models

All models are implemented using TensorFlow, and Keras. For each of the 6 Transformer based models we additionally use the Huggingface Transformers library. For the language models, any tokenisation was performed as appropriate for that model. For example, most Transformer based language models utilise WordPiece or SentencePiece tokenisation, and BERT requires sequences to have a special [CLS] token prepended. Again, tokenisation was performed using the HuggingFace Transformers library, to maintain consistency with the vocabulary and any special tokens associated with the particular model.

All of the model implementations are in models.py and implement the same abstract class. Valid model names are listed in the get_model() function at the top of models.py. Some of the models require, or in the case of the language models have been wrapped in, custom Keras layers. These are all defined in the layers directory.

Each model has default parameters defined, however, they can also be specified (or changed) in model_params.json. These include the learining rate, optimiser, dropout, number of hidden units, etc.

Example Model Params:

"cnn": {
"learning_rate": 0.002,
"optimiser": "adam",
"num_filters": 64,
"kernel_size": 5,
"pool_size": 8,
"dropout_rate": 0.27,
"dense_units": 224
}

Datasets

Datasets are acquired and processed via the data_processor.py script. First a DataProcessor must be instantiated with the task_name, and various other options for data preprocessing. The entire dataset can then be downloaded and processed with get_dataset() function, which saves a local copy in .npz format. The train, test and val sets can then be built using one of the 'build dataset' functions.

Note: All of these steps are already implemented in each of the run scripts, you simply need to specify the task_name and preprocessing parameters in the experiment_params.

Currently the data processor supports 4 different corpora:

Corpus	Num Classes	Vocabulary Size	Max Utterance Length (mean)	Total Utterances	Training Utterances	Validation Utterances	Test Utterances
SwDA	41	22301	133 (9.6)	199740	192390	3272	4078
Maptask	12	1797	115 (6.2)	26743	21052	2929	2762
MRDA	5/12/52	10866	85 (8)	108202	75067	16433	16702
Oasis	42	2230	449 (9.7)	15067	12076	1502	1489

Embeddings

Embeddings are generated with the embedding_processor.py script by running the embedding_processor.get_embedding() function with the appropriate parameters. Word2Vec, GloVe and FastText are automatically downloaded with GluonNLP. The Dependency and Numberbatch embeddings must be downloaded seprately and stored in the embeddings directory.

Note: This is already implemented in each of the run scripts, and is only necessary for supervised models. You only neeed to specify the embedding_type, embedding_dim and embedding_source (file name) in the experiment_params.

Currently the embedding processor supports 6 different embeddings:

Word2Vec (Mikolov et al., 2013)
GloVe (Pennington, Socher and Manning, 2014)
FastText (Joulin et al., 2017)
Dependency (Levy and Goldberg, 2014)
Numberbatch (Speer, Chin and Havasi, 2016)
Random

Results

Results and analysis for Switchboard and Maptask data can be viewed in the swda_results.ipynb and maptask_results.ipynb Jupyter notebooks.

All of the data for individual experiment types can be viewed in the comet project.

Citation

If you are using any code or data from this project in your work please cite: Duran, N., Battle, S., & Smith, J. (2021). Sentence encoding for Dialogue Act classification. Natural Language Engineering, 1-30. doi:10.1017/S1351324921000310

NathanDuran / Sentence-Encoding-for-DA-Classification