GSidiropoulos / kgsqa_for_unseen_domains

Knowledge Graph Simple Question Answering for Unseen Domains

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Knowledge Graph Simple Question Answering for Unseen Domains

This is the official implementation of the following paper: Georgios Sidiropoulos, Nikos Voskarides, Evangelos Kanoulas. Knowledge Graph Simple Question Answering for Unseen Domains. In: Proceedings of AKBC. 2020

Installation

Requirements

Our framework requires Python 3.6. The other dependencies are listed in requirements.txt.

Set up

Run the following commands to clone the repository and install our framework:

git clone https://github.com/GSidiropoulos/kgsqa_for_unseen_domains.git
cd kgsqa_for_unseen_domains
pip install -r requirements.txt

Download data

Download word embeddings, SimpleQuestions dataset, and data necessary for our framework.

bash setup.sh

1. Mention Detection

Create data for Mention Detection

python md_data.py --path_load_sq data/SimpleQuestions_v2/ --path_load_mid2ent data/ --target_domain <domain> --path_save <path to MD data output dir>

e.g.

python md_data.py --path_load_sq data/SimpleQuestions_v2/ --path_load_mid2ent data/ --target_domain astronomy --path_save data/md_astronomy/

This generates the data we need to train the Mention Detection model. Data are generated under data/md_astronomy/

Train a model for Mention Detection

python train_md.py --job_id <slurm job id> --path_load <path to MD output dir> --path_save <path to MD models dir> --model_type <type> --target_domain <domain> --layers 2 --units 600 600 --rec_dropout 0.2 0.2 --dropout 0.4 0  --lr 0.001 --batch_size 300 --max_epochs 50 --save_model

e.g.

python train_md.py --job_id 1 --path_load data/md_astronomy/ --path_save saved_models/md/ --model_type rbilstm --target_domain astronomy --layers 2 --units 600 600 --rec_dropout 0.2 0.2 --dropout 0.4 0  --lr 0.001 --batch_size 300 --max_epochs 1 --save_model

Saves the trained MD model under saved_models/md/1/ and also generates a data_new.csv under data/md_astronomy/test/ and data/md_astronomy/valid/ which contains the predictions of the MD model.

2. Candidate Generation

python candidate_generation.py --path_load_md_data <path to dir of data_new.csv> --path_load_mid2ent data/ --path_inverted_index data/ --path_save <path to CG output dir>

e.g.

python candidate_generation.py --path_load_md_data data/md_astronomy/test/ --path_load_mid2ent data/ --path_inverted_index data/ --path_save data/md_astronomy/test/

This generates the candidates.pkl file under data/md_astronomy/test/

3. Question Generation over KG

Create Keywords for each relation

Follow the instructions in create_keywords

Generate questions w.r.t keywords

Use the Zero-shot KGQG. The original work uses as predicate textual context the set of words that appear on the dependency path between the subject and the object mentions in the sentence. However, in our approach, the predicate textual context is a set of keywords. That said, replace the respective files in Zero-shot KGQG with the ones generated in the previous step.

4. Relation Prediction

Create data for Relation Prediction

python rp_data.py --path_load_sq data/SimpleQuestions_v2/ --path_load_md_data <path to MD data output dir> --path_load_mid2ent data/ --path_load_synthetic <path to synthetic questions> --path_save <path to RP data output dir> --target_domain <domain> --placeholders --use_synthetic_questions

e.g.

python rp_data.py --path_load_sq data/SimpleQuestions_v2/ --path_load_md_data data/md_astronomy/ --path_load_mid2ent data/ --path_load_synthetic daata/synthetic_questions/astronomy_synthetic.csv --path_save data/rp_astronomy/ --target_domain astronomy --placeholders --use_synthetic_questions

This generates the data we need to train the Relation Prediction model. Data are generated under data/rp_astronomy/

Train a model for Relation Prediction

python train_rp.py --job_id <slurm job id> --path_load <path to RP data output dir> --path_save <path to RP models dir> --model_type lstm_words --target_domain <domain> --units 400 --lr 0.001 --batch_size 300 --max_epochs 5 --save_model --path_test_candidates <path to CG output dir> --use_synthetic_questions

e.g.

python train_rp.py --job_id 2 --path_load data/rp_astronomy/ --path_save saved_models/rp/ --model_type lstm_words --target_domain astronomy --units 400 --lr 0.001 --batch_size 300 --max_epochs 5 --save_model --path_test_candidates data/md_astronomy/test/ --use_synthetic_questions --total_negatives 1 --negatives_intersection 1

Saves the trained RP model under saved_models/rp/2/ and also generates rp_test_results.pkl which contains the predictions of the RP model.

5. Answer Selection

python answer_selection.py --path_load_gold <path to RP data.csv> --path_cg  <path to candidates.pkl> --path_rp_predictions <path to rp_test_results.pkl> --path_fb data/SimpleQuestions_v2/freebase-subsets/freebase-FB2M.txt --path_pred2ix <path to pred2ix.pkl> --path_ix2pred <path to ix2pred.pkl> --path_mid2entity <path to mid2ent.pkl>

e.g.

python answer_selection.py --path_load_gold data/rp_astronomy/test/data.csv --path_cg  data/md_astronomy/test/candidates.pkl --path_rp_predictions saved_models/rp/2/rp_test_results.pkl --path_fb data/SimpleQuestions_v2/freebase-subsets/freebase-FB2M.txt --path_pred2ix data/rp_astronomy/pred2ix.pkl --path_ix2pred data/rp_astronomy/ix2pred.pkl --path_mid2entity data/mid2ent.pkl

QA accuracy w.r.t the predicted object and w.r.t the predicted (subject, relation) pair.

Citation

If you find this work helpful or use it in your own work, please cite our paper.

@inproceedings{
sidiropoulos2020knowledge,
title={Knowledge Graph Simple Question Answering for Unseen Domains},
author={Georgios Sidiropoulos and Nikos Voskarides and Evangelos Kanoulas},
booktitle={Automated Knowledge Base Construction},
year={2020},
url={https://openreview.net/forum?id=Ie2Y94Ty8K},
doi={10.24432/C5H01X}
}

About

Knowledge Graph Simple Question Answering for Unseen Domains

License:MIT License


Languages

Language:Jupyter Notebook 62.1%Language:Python 37.2%Language:Shell 0.7%