DecAF: Joint Decoding Answer and Logical Form for KBQA through Retrieval

This is the official implementation of our paper "DecAF: Joint Decoding Answer and Logical Form for Knowledge Base Question Answering through Retrieval", in ICLR 2023 OpenReview arXiv.

1. Set up

conda create -n DecAF python=3.8
conda activate DecAF
pip install -r requirements.txt
pip install -e .
source config.sh ${your base directory to store data, models, and results}
conda activate DecAF

Then follow instructions in https://github.com/dki-lab/Freebase-Setup to set up the Freebase server, which is necessary for data preprocessing and evaluation.

2. Preprocessing

Knowledge source

Download the Freebase data, extract and put it under ${DATA_DIR}/knowledge_source/Freebase. The directory structure should be like this:

${DATA_DIR}/knowledge_source/Freebase
├── topic_entities_parts
├── triple_edges_parts
├── id2name_parts
├── id2name_parts_disamb

Preprocess the knowledge source, i.e., Freebase:

python DecAF/Knowledge/process_freebase.py --data_dir ${DATA_DIR}/knowledge_source/Freebase

Datasets

Please see DecAF/Datasets for preprocessing datasets including WebQSP, GrailQA, ComplexWebQuestions, and FreebaseQA.

3. Retrieval

We use PySerini for BM25 retrieval.

cd DecAF/Retrieval/Pyserini
bash build_index_sparse.sh      # build sparse index for knowledge source
bash run_search_sparse.sh -d GrailQA -s dev     # retrieve from knowledge source

You can change -d argument to WebQSP, CWQ, or FreebaseQA, and -s argument to train or test.

You should see the following results:

	WebQSP (test)	CWQ (test)	GrailQA (dev)	FreebaseQA (test)
Hits@100	81.3	63.4	90.1	93.6
Recall@100	67.8	57.5	85.0	93.6

if you encounter errors about java version, try to install java 11 and run:

export JAVA_HOME={YOUR_OWN_PATH}/jdk-17.0.3.1

For dense retrieval, we use DPR and train it on each dataset. We refer the readers to the original repo for details on how to conduct training and inference with DPR. It should be emphasized that, based on our experiments, DPR demonstrates superior performance in comparison to BM25 solely on the WebQSP dataset.

4. Reading (Answer Generation)

We use FiD as the reading module, which takes the output of the retriving module as input. Note that FiD requires transformers==3.0.2, which is conflicting with the version required by PySerini. We recommend to create a new conda environment for FiD. Remember to run source config.sh ${your base directory to store data, models, and results} again after creating the new environment to set the environment variables.

Process the retrieval results to the format required by FiD:

cd DecAF/Reading
python process_fid.py --retrieval_data_path ${SAVE_DIR}/Retrieval/pyserini/search_results/QA_GrailQA_Freebase_BM25 --mode SPQA

You can change the --mode argument to QA, which is for FreebaseQA since it does not provide anotated logical forms.

Download FiD and replace it with our modified code which supports beam search:

git clone https://github.com/facebookresearch/FiD.git

cp test_reader.py FiD/ 
cp train_reader.py FiD/

Train FiD:

bash bash/run_train.sh -t ${SAVE_DIR}/Retrieval/pyserini/search_results/QA_GrailQA_Freebase_BM25/train_fid_SPQA.json -e ${SAVE_DIR}/Retrieval/pyserini/search_results/QA_GrailQA_Freebase_BM25/dev_fid_SPQA.json -s 30000

-t argument is the path of the training data, -e argument is the path of the evaluation data, and -s argument is the number of training steps. We recommend to use 30000 steps for WebQSP, GrailQA, and FreebaseQA while 60000 steps for CWQ.

Inference with FiD:

bash bash/run_test.sh -d ${SAVE_DIR}/Retrieval/pyserini/search_results/QA_WebQSP_Freebase/test_fid_SPQA.json -m ${MODEL_DIR}/Reading/FiD/WebQSP_Freebase_DPR_FiDlarge -b 10

-d argument is the path of the inference queries, -m argument is the path of the trained model, and -b argument is the beam size. We recommend to use 10 for WebQSP and GrailQA, 1 for FreebaseQA while 20 for CWQ.

5. Evaluation

cd DecAF/Datasets/QA

python evaluate.py --dataset GrailQA --result_path ${SAVE_DIR}/Reading/FiD/QA_GrailQA_Freebase_BM25_large_100_p100_b10/final_output_dev_fid_SPQA.json

You can change the --dataset argument to WebQSP, CWQ, or FreebaseQA, and --result_path argument to the path of the inference results.

6. Pre-trained Models and Predicted Results

FiD Model	Prediction	Metric
QA_WebQSP_Freebase_BM25_large_100	p100_b10 (test)	F1=75.3
QA_WebQSP_Freebase_DPR_large_100	p100_b10 (test)	F1=77.1
QA_WebQSP_Freebase_DPR_3b_100	p100_b15 (test)	F1=78.8
QA_GrailQA_Freebase_BM25_large_100	p100_b10 (dev)	F1=78.7
QA_GrailQA_Freebase_BM25_3b_100	p100_b15 (dev)	F1=81.4
QA_CWQ_Freebase_BM25_large_100	p100_b20 (test)	Hits@1=68.7
QA_CWQ_Freebase_BM25_3b_100	p100_b15 (test)	Hits@1=70.4
QA_FreebaseQA_Freebase_BM25_large_100	p100_b1 (test)	Hits@1=80.6

awslabs / decode-answer-logical-form