Multi-Task Deep Neural Networks for Natural Language Understanding

This PyTorch package implements the Multi-Task Deep Neural Networks (MT-DNN) for Natural Language Understanding, as described in:

Xiaodong Liu*, Pengcheng He*, Weizhu Chen and Jianfeng Gao
Multi-Task Deep Neural Networks for Natural Language Understanding
arXiv version
*: Equal contribution

Quickstart

Setup Environment

Install via pip:

python3.6
install requirements
> pip install -r requirements.txt

Use docker:

pull docker
> docker pull allenlao/pytorch-mt-dnn:v0.1
run docker
> docker run -it --rm --runtime nvidia allenlao/pytorch-mt-dnn:v0.1 bash
Please refere the following link if you first use docker: https://docs.docker.com/

Train a toy MT-DNN model

download data
> sh download.sh
Please refer to download GLUE dataset: https://gluebenchmark.com/
preprocess data
> python prepro.py
training
> python train.py

Note that we ran experiments on 4 V100 GPUs for base mt-dnn models. You may need to reduce batch size for other GPUs.

GLUE Result reproduce

MTL refinement: refine MT-DNN (shared layers), initialized with the pre-trained BERT model, via MTL using all GLUE tasks excluding WNLI to learn a new shared representation.
Note that we ran this experiment on 8 V100 GPUs (32G) with a batch size of 32.
- Preprocess GLUE data via the aforementioned script
- Training:
  >scripts\run_mt_dnn.sh
Finetuning: finetune MT-DNN to each of the GLUE tasks to get task-specific models.
Here, we preovide two examples, STS-B and RTE. You can use similar scripts to finetune all the GLUE tasks.
- Finetune on the STS-B task
  > scripts\run_stsb.sh
  You should get about 90.5/90.4 on STS-B dev in terms of Pearson/Spearman correlation.
- Finetune on the RTE task
  > scripts\run_rte.sh
  You should get about 83.8 on RTE dev in terms of accuracy.

SciTail & SNIL Result reproduce (Domain Adaptation)

Domain Adaptation on SciTail
>scripts\scitail_domain_adaptation_bash.sh
Domain Adaptation on SNLI
>scripts\snli_domain_adaptation_bash.sh

Notes and Acknowledgments

BERT pytorch is from: https://github.com/huggingface/pytorch-pretrained-BERT
BERT : https://github.com/google-research/bert
We also used some code from: https://github.com/kevinduh/san_mrc

How do I cite MT-DNN?

For now, please cite arXiv version:

@article{liu2019mt-dnn,
  title={Multi-Task Deep Neural Networks for Natural Language Understanding},
  author={Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng},
  journal={arXiv preprint arXiv:1901.11504},
  year={2019}
}

and a new version of the paper will be shared later.

Typo: there is no activation fuction in Equation 2.

Contact Information

For help or issues using MT-DNN, please submit a GitHub issue.

For personal communication related to MT-DNN, please contact Xiaodong Liu (xiaodl@microsoft.com), Pengcheng He (penhe@microsoft.com), Weizhu Chen (wzchen@microsoft.com) or Jianfeng Gao (jfgao@microsoft.com).

singhranjodh / mt-dnn