MultiSpanQA: A Dataset for Multi-Span Question Answering

This repo provides the source code & data of our paper: MultiSpanQA: A Dataset for Multi-Span Question Answering (NAACL 2022).

@inproceedings{li2022multispanqa,
  title={MultiSpanQA: A Dataset for Multi-Span Question Answering},
  author={Li, Haonan and Tomko, Martin and Vasardani, Maria and Baldwin, Timothy},
  booktitle={Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  pages={1250--1260},
  year={2022}
}

Leaderboard: https://multi-span.github.io.

Requirements

Python >= 3.7

pytorch >= 1.8.1

huggingface >= 4.17.0

Fine-tune BERT tagger on MultiSpanQA (Recommended)

python run_tagger.py \
    --model_name_or_path bert-base-uncased \
    --data_dir ../data/MultiSpanQA_data \
    --output_dir ../output \
    --overwrite_output_dir \
    --overwrite_cache \
    --do_train \
    --do_eval \
    --per_device_train_batch_size 4 \
    --eval_accumulation_steps 50 \
    --learning_rate 3e-5 \
    --num_train_epochs 3 \
    --max_seq_length  512 \
    --doc_stride 128

To try other encoders, replace the model name bert-base-uncased with other model names, currently we support bert-large-uncased, roberta-base and roberta-large. You are expected to get similar results as:

Encoder	Exact Match			Partial Match
	Precision	Recall	F1	Precision	Recall	F1
BERT-base	55.53	63.51	59.25	76.71	75.52	76.11
BERT-large	59.25	64.47	61.75	78.79	77.24	78.01
Roberta-base	61.43	67.30	64.23	80.72	79.83	80.27
Roberta-large	66.02	71.84	68.81	84.16	84.61	84.39

Fine-tune Huggingface QA model on MultiSpanQA

Since the QA model is single-span model, you need to change MultiSpanQA to the format that can be trained on single-span model by run:

python generate_squad_format.py

This will generate two train files in squad formet. You can choose to fine-tune BERT on one of them (for example v1) using:

python run_squad.py \
    --model_name_or_path bert-base-uncased \
    --train_file ../data/MultiSpan_data/squad_train_softmax_v1.json \
    --validation_file ../data/MultiSpan_data/squad_valid.json \
    --output_dir ../output \
    --overwrite_output_dir \
    --overwrite_cache \
    --do_train \
    --do_eval \
    --per_device_train_batch_size 4 \
    --eval_accumulation_steps 50 \
    --learning_rate 3e-5 \
    --num_train_epochs 3 \
    --max_seq_length  512 \
    --doc_stride 128

About

MultiSpanQA: A Dataset for Multi-Span Question Answering

Languages

Language:Python 100.0%