LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework
Introduction
LibriSQA, built on LibriSpeech [1], offers the first free-form and open-ended spoken question answering (SQA) dataset tailored for large language models (LLMs) to undergo end-to-end SQA training, featuring genuine speech and lengths suitable for LLMs. It has two parts: Part I with natural dialogues and Part II in a multiple-choice format, complete with correct answers and analysis. Using LibriSQA, we've successfully introduced a speech-text multimodal training framework capable of handling tasks like common sense question answering, automatic speech recognition (ASR), as well as natural dialogue SQA and multiple-choice SQA, showcasing that with LibriSQA, models can be trained to excel in speech-text alignment, efficiently leveraging multimodal data.
We have released the dataset and we will release the code soon.
Usage
1. Download the speech from LibriSpeech
Training: https://www.openslr.org/resources/12/train-clean-360.tar.gz
testing: https://www.openslr.org/resources/12/test-clean.tar.gz
2. Download LibriSQA
The dataset is available at Huggingface
Model
Demo
1. Automatic speech recognition (ASR)
Trained with LibriSpeech [1] with SQA format.
2. Automatic speech recognition (ASR) without LibriSpeech
Trained with our LibriSQA without any speech-text pair.
3. LibriSQA Part I
Trained with LibriSQA Part I.
4. LibriSQA Part II
Trained with LibriSQA Part II.
Acknowledgement
[1] LibriSpeech: An ASR corpus based on public domain audio books: -- https://ieeexplore.ieee.org/abstract/document/7178964
[2] LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/2302.13971
[3] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention -- https://arxiv.org/abs/2303.16199
[4] PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering -- https://arxiv.org/abs/2305.10415
We thank the authors for their great idea and open-sourced code which helped us with this paper.
Contribution
Please raise an issue if you need help, any contributions are welcomed.
Citation
If you use LibriSQA for your research, please cite our paper
@article{zhao2023librisqa,
title={LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework},
author={Zhao, Zihan and Jiang, Yiyang and Liu, Heyang and Wang, Yanfeng and Wang, Yu},
journal={arXiv preprint arXiv:2308.10390},
year={2023}
}