MultiSpanQA: A Dataset for Multi-Span Question Answering
This repo provides the source code & data of our paper: MultiSpanQA: A Dataset for Multi-Span Question Answering (NAACL 2022).
@inproceedings{li2022multispanqa,
title={MultiSpanQA: A Dataset for Multi-Span Question Answering},
author={Li, Haonan and Tomko, Martin and Vasardani, Maria and Baldwin, Timothy},
booktitle={Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages={1250--1260},
year={2022}
}
Leaderboard: https://multi-span.github.io.
Requirements
Python >= 3.7
pytorch >= 1.8.1
huggingface >= 4.17.0
Fine-tune BERT tagger on MultiSpanQA (Recommended)
python run_tagger.py \
--model_name_or_path bert-base-uncased \
--data_dir ../data/MultiSpanQA_data \
--output_dir ../output \
--overwrite_output_dir \
--overwrite_cache \
--do_train \
--do_eval \
--per_device_train_batch_size 4 \
--eval_accumulation_steps 50 \
--learning_rate 3e-5 \
--num_train_epochs 3 \
--max_seq_length 512 \
--doc_stride 128
To try other encoders, replace the model name bert-base-uncased
with other model names, currently we support bert-large-uncased
, roberta-base
and roberta-large
.
You are expected to get similar results as:
Encoder | Exact Match | Partial Match | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | |
BERT-base | 55.53 | 63.51 | 59.25 | 76.71 | 75.52 | 76.11 |
BERT-large | 59.25 | 64.47 | 61.75 | 78.79 | 77.24 | 78.01 |
Roberta-base | 61.43 | 67.30 | 64.23 | 80.72 | 79.83 | 80.27 |
Roberta-large | 66.02 | 71.84 | 68.81 | 84.16 | 84.61 | 84.39 |
Fine-tune Huggingface QA model on MultiSpanQA
Since the QA model is single-span model, you need to change MultiSpanQA to the format that can be trained on single-span model by run:
python generate_squad_format.py
This will generate two train files in squad formet. You can choose to fine-tune BERT on one of them (for example v1) using:
python run_squad.py \
--model_name_or_path bert-base-uncased \
--train_file ../data/MultiSpan_data/squad_train_softmax_v1.json \
--validation_file ../data/MultiSpan_data/squad_valid.json \
--output_dir ../output \
--overwrite_output_dir \
--overwrite_cache \
--do_train \
--do_eval \
--per_device_train_batch_size 4 \
--eval_accumulation_steps 50 \
--learning_rate 3e-5 \
--num_train_epochs 3 \
--max_seq_length 512 \
--doc_stride 128