LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework

LibriSQA

Introduction

LibriSQA, built on LibriSpeech [1], offers the first free-form and open-ended spoken question answering (SQA) dataset tailored for large language models (LLMs) to undergo end-to-end SQA training, featuring genuine speech and lengths suitable for LLMs. It has two parts: Part I with natural dialogues and Part II in a multiple-choice format, complete with correct answers and analysis. Using LibriSQA, we've successfully introduced a speech-text multimodal training framework capable of handling tasks like common sense question answering, automatic speech recognition (ASR), as well as natural dialogue SQA and multiple-choice SQA, showcasing that with LibriSQA, models can be trained to excel in speech-text alignment, efficiently leveraging multimodal data.

We have released the dataset and we will release the code soon.

Usage

1. Download the speech from LibriSpeech

Training: https://www.openslr.org/resources/12/train-clean-360.tar.gz

testing: https://www.openslr.org/resources/12/test-clean.tar.gz

2. Download LibriSQA

The dataset is available at Huggingface

Model

Demo

1. Automatic speech recognition (ASR)

Trained with LibriSpeech [1] with SQA format.

2. Automatic speech recognition (ASR) without LibriSpeech

Trained with our LibriSQA without any speech-text pair.

3. LibriSQA Part I

Trained with LibriSQA Part I.

4. LibriSQA Part II

Trained with LibriSQA Part II.

Acknowledgement

[1] LibriSpeech: An ASR corpus based on public domain audio books: -- https://ieeexplore.ieee.org/abstract/document/7178964

[2] LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/2302.13971

[3] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention -- https://arxiv.org/abs/2303.16199

[4] PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering -- https://arxiv.org/abs/2305.10415

We thank the authors for their great idea and open-sourced code which helped us with this paper.

Contribution

Please raise an issue if you need help, any contributions are welcomed.

Citation

If you use LibriSQA for your research, please cite our paper

@article{zhao2023librisqa,
         title={LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework},
         author={Zhao, Zihan and Jiang, Yiyang and Liu, Heyang and Wang, Yanfeng and Wang, Yu},
         journal={arXiv preprint arXiv:2308.10390},
         year={2023}
}

ZihanZhaoSJTU / LibriSQA