Official code for EACL'24 SRW paper "Distribution Shifts Are Bottlenecks: Extensive Evaluation for Grounding Language Models to Knowledge Bases".
[arXiv] [Proceedings] [🤗 Datasets] [BibTeX]
This repo contains a data augmentation method named Graph seArch and questIon generatioN (GAIN). GAIN could be used to augment any neural KBQA models. For the TIARA model in this paper, please check this repo.
If you find this paper or repo useful, please cite:
@inproceedings{shu-yu-2024-distribution,
title = "Distribution Shifts Are Bottlenecks: Extensive Evaluation for Grounding Language Models to Knowledge Bases",
author = "Shu, Yiheng and
Yu, Zhiwei",
editor = "Falk, Neele and
Papi, Sara and
Zhang, Mike",
booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop",
month = mar,
year = "2024",
address = "St. Julian{'}s, Malta",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.eacl-srw.7",
pages = "71--88"
}
Please follow Freebase Setup to set up a Virtuoso service. Note that at least 30G RAM and 53G+ disk space is needed for Freebase
Virtuoso. The download may take some time. The default port of this service is localhost:3001
. If you change the port of Virtuoso service, please also modify the Freebase port setting in utils/config.py
.
The working dir for following commands is src
.
Graph search for logical form:
python algorithm/graph_query/logical_form_search.py --domain synthetic --output_dir ../dataset/question_generation
Graph search for triple:
python algorithm/graph_query/triple_search.py --output_dir ../dataset/question_generation
If the QG models have been trained, the training will be skipped and verbalization will be performed. In this step, you can directly use our implementation or modify the code to train a verbalizer on any KBQA datasets with logical form / triple annotations.
Training QG model for logical form (checkpoint):
python algorithm/question_generation/logical_form_question_generation.py --model_dir ../model/logical_form_question_generation
Training QG model for triple (checkpoint):
python algorithm/question_generation/triple_question_generation.py --model_dir ../model/triple_question_generation
How to use the synthetic data and which KBQA model to use depends on your choice. In this paper, the synthetic dataset is used to pre-train a KBQA model and the model is fine-tuned on different KBQA datasets, respectively.
We modify the official evaluation scripts of GrailQA and GraphQuestions for paraphrase adaptation, i.e., utils/statistics/grailqa_evaluate.py
and utils/statistics/graphq_evaluate.py
.
To evaluate your QA results with utils/statistics/graphq_evaluate.py
, you may need to generate a result template via utils/statistics/graphq_evaluate_template.py
. The template is based on this result format.
Datasets and retrieval results using TIARA and TIARA + GAIN can be found at 🤗 Datasets.
- GrailQA
- GraphQuestions Freebase 2013 version
- GraphQuestions Freebase 2015 version
- WebQuestionsSP
- SimpleQuestions - Balanced
- GrailQA exemplary logical form retrieval (TIARA + GAIN)
- GrailQA schema retrieval (TIARA + GAIN)
- GraphQuestions exemplary logical form retrieval (TIARA)
- GraphQuestions exemplary logical form retrieval (TIARA + GAIN)
- GraphQuestions schema retrieval (TIARA)
- GraphQuestions schema retrieval (TIARA + GAIN)