Codes for a journal paper: "Skeleton Parsing for Complex Question Answering over Knowledge Bases" . If you meet any questions, please email to him (ywsun at smail.nju.edu.cn).
File | Description |
---|---|
kbcqa | Codes of skeleton-based SP and IR approaches |
skeletons | Skeleton Bank from three complex KBQA datasets |
We annotate and publish a skeleton bank of 15,166 questions from three KBQA datasets.
The skeleton bank is json format. An example:
{
"question": "People from the country with the capital Brussels speak what languages ?",
"skeleton": [
{
"question": "People from the country with the capital Brussels speak what languages ?",
"text_span": "with the capital Brussels",
"headword_index": 3,
"attachment_relation": "nmod"
},
{
"question": "People from the country speak what languages ?",
"text_span": "from the country",
"headword_index": 0,
"attachment_relation": "nmod"
}
]
}
Note that we will explain how to run the codes of kbcqa file below.
The cofiguration of SkeletonKBQA is in kbcqa/common/globals_args.py.
- root: root of all resources and datasets, default ../dataset.
- q_mode: a specific KBQA dataset: lcquad, graphq, and cwq.
- sutime: jar files path of SUTime Java library tool.
- corenlp_ip_port: ip port of Stanford CoreNLP server.
- dbpedia_pyodbc: odbc of DBpedia virtuoso server.
- dbpedia_sparql_html: web site of DBpedia virtuoso server.
- freebase_pyodbc: odbc of Freebase virtuoso server.
- freebase_sparql_html: web site of Freebase virtuoso server.
The zip file from google drive contains three parts:
- Stanford CoreNLP server
- SUTime Java library
- BERT pre-trained Models
Note that download, unzip the zip file, and then copy it to the root folder.
- DBpedia 201604 version for LC-QuAD 1.0
- Freebase 2013 version for GraphQuestions
- Freebase latest version for ComplexWebQuestions 1.1
Note that download a virtuoso server and load the above KBs.
You only need to load a specific KB which is correspond to your KBQA dataset.
The zip file from google drive contains three parts:
- LC-QuAD 1.0 datasets
- Its skeleton parsing models
- Its corresponding KB entity-related Lexicons
Note that download, unzip the zip file, and then copy it to the root.
The zip file from google drive contains three parts:
- GraphQuestions datasets
- Its skeleton parsing models
- Its corresponding KB entity-related Lexicons
Note that download, unzip the zip file, and then copy it to the root.
The zip file from google drive contains three parts:
- ComplexWebQuestions 1.1 datasets
- Its skeleton parsing models
- Its corresponding KB entity-related Lexicons
Note that download, unzip the zip file, and then copy it to the root.
SkeletonKBQA contains two KBQA approaches: SSP and SIR.
- Skeleton-based semantic parsing approach (SSP) has four modules:
- Ungrounded query generation
- Entity linking
- Candidate grounded query generation
- Semantic matching
Note that the above four modules are correspond to the arguement module
in kbcqa/method_sp/sp_pipeline.py
.
Run the provided SSP scripts as:
bash run_ssp_LCQ.sh
bash run_ssp_GraphQ.sh
bash run_ssp_CWQ.sh
- Skeleton-based Information Retrieval approach (SIR) has three modules:
- Node recogniztion and linking
- Candidate grounded path generation
- Semantic matching
Note that the above three modules are correspond to the arguement module
in kbcqa/method_ir/ir_pipeline.py
.
Run the provided SIR scripts as:
bash run_sir_LCQ.sh
bash run_sir_GraphQ.sh
bash run_sir_CWQ.sh
If you have any difficulty or questions in running codes, reproducing experimental results, and skeleton parsing, please email to him (ywsun at smail.nju.edu.cn).