Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework

This repo has the code for the paper "Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework" accepted at EMNLP 2021 Findings. The blog on this paper can be found here, the poster here, and a corresponding presentation here.

Required dependencies -

Please run pip install -r requirements.txt (python3 required)

E-Manual pre-training corpus

Go to this link. A RoBERTa BASE Model pre-trained on the corpus can be found here, and a BERT BASE UNCASED Model pre-trained on the same here.

Codes

Annotated Data and Amazon User Forum Data Samples are present in data (See README)
Data Analysis is done in data_analysis (See README)
Corpus extraction code is present in pre_training_corpus_extraction (See README)
E-Manual Data Extraction code is present in EManual_data_extraction (See README)
Code on pre-training is given in pre-training (See README)
Code on unsupervised IR method and fine-tuning variants is given in fine_tuning_variants_scripts (See README)
Code on multi-task learning is given in MTL_scripts (See README)
Code on funtions for evaluation of MTL and fine-tuning variants is given in evaluation (See README)
- For ROUGE-L Precision, Recall and F1-Score: https://pypi.org/project/py-rouge/
- For S+WMS: https://github.com/eaclark07/sms

Baselines

Dense Passage Retrieval(DPR) - Used HuggingFace implementation (https://huggingface.co/transformers/model_doc/dpr.html)
Technical Answer Prediction (TAP) - took the help of code in https://github.com/IBM/techqa
MultiSpan - took the help of code in https://github.com/eladsegal/tag-based-multi-span-extraction

Citation

Please cite the work if you would like to use it.

@article{nandy2021question,
  title={Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework},
  author={Nandy, Abhilash and Sharma, Soumya and Maddhashiya, Shubham and Sachdeva, Kapil and Goyal, Pawan and Ganguly, Niloy},
  journal={arXiv preprint arXiv:2109.05897},
  year={2021}
}

About

This repo has the code for the paper "Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework" accepted at EMNLP 2021 Findings.

Languages

Language:HTML 94.7%Language:Python 4.5%Language:Jupyter Notebook 0.9%