nju-websoft / SkeletonKBQA

Skeleton parsing for complex question answering over knowledge bases (JoWS 2022)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SkeletonKBQA: Skeleton Parsing for Complex Question Answering over Knowledge Bases

Codes for a journal paper: "Skeleton Parsing for Complex Question Answering over Knowledge Bases" . If you meet any questions, please email to him (ywsun at smail.nju.edu.cn).

Project Structure:

FileDescription
kbcqaCodes of skeleton-based SP and IR approaches
skeletonsSkeleton Bank from three complex KBQA datasets

Skeleton Bank

We annotate and publish a skeleton bank of 15,166 questions from three KBQA datasets.

The skeleton bank is json format. An example:

{
"question": "People from the country with the capital Brussels speak what languages ?",
"skeleton": [
	{
		"question": "People from the country with the capital Brussels speak what languages ?",
		"text_span": "with the capital Brussels",
		"headword_index": 3,
		"attachment_relation": "nmod"
	},
	{
		"question": "People from the country speak what languages ?",
		"text_span": "from the country",
		"headword_index": 0,
		"attachment_relation": "nmod"
	}
]
}

Note that we will explain how to run the codes of kbcqa file below.

Requirements

Configuration

The cofiguration of SkeletonKBQA is in kbcqa/common/globals_args.py.

  • root: root of all resources and datasets, default ../dataset.
  • q_mode: a specific KBQA dataset: lcquad, graphq, and cwq.
  • sutime: jar files path of SUTime Java library tool.
  • corenlp_ip_port: ip port of Stanford CoreNLP server.
  • dbpedia_pyodbc: odbc of DBpedia virtuoso server.
  • dbpedia_sparql_html: web site of DBpedia virtuoso server.
  • freebase_pyodbc: odbc of Freebase virtuoso server.
  • freebase_sparql_html: web site of Freebase virtuoso server.

Common Resources

The zip file from google drive contains three parts:

  • Stanford CoreNLP server
  • SUTime Java library
  • BERT pre-trained Models

Note that download, unzip the zip file, and then copy it to the root folder.

Knowledge Bases

Note that download a virtuoso server and load the above KBs.

You only need to load a specific KB which is correspond to your KBQA dataset.

LC-QuAD 1.0 Resources

The zip file from google drive contains three parts:

  • LC-QuAD 1.0 datasets
  • Its skeleton parsing models
  • Its corresponding KB entity-related Lexicons

Note that download, unzip the zip file, and then copy it to the root.

GraphQuestions Resources

The zip file from google drive contains three parts:

  • GraphQuestions datasets
  • Its skeleton parsing models
  • Its corresponding KB entity-related Lexicons

Note that download, unzip the zip file, and then copy it to the root.

ComplexWebQuestions 1.1

The zip file from google drive contains three parts:

  • ComplexWebQuestions 1.1 datasets
  • Its skeleton parsing models
  • Its corresponding KB entity-related Lexicons

Note that download, unzip the zip file, and then copy it to the root.

Run SkeletonKBQA

SkeletonKBQA contains two KBQA approaches: SSP and SIR.

  • Skeleton-based semantic parsing approach (SSP) has four modules:
    • Ungrounded query generation
    • Entity linking
    • Candidate grounded query generation
    • Semantic matching

Note that the above four modules are correspond to the arguement module in kbcqa/method_sp/sp_pipeline.py.

Run the provided SSP scripts as:

bash run_ssp_LCQ.sh
bash run_ssp_GraphQ.sh
bash run_ssp_CWQ.sh
  • Skeleton-based Information Retrieval approach (SIR) has three modules:
    • Node recogniztion and linking
    • Candidate grounded path generation
    • Semantic matching

Note that the above three modules are correspond to the arguement module in kbcqa/method_ir/ir_pipeline.py.

Run the provided SIR scripts as:

bash run_sir_LCQ.sh
bash run_sir_GraphQ.sh
bash run_sir_CWQ.sh

Contacts

If you have any difficulty or questions in running codes, reproducing experimental results, and skeleton parsing, please email to him (ywsun at smail.nju.edu.cn).

About

Skeleton parsing for complex question answering over knowledge bases (JoWS 2022)

License:Apache License 2.0


Languages

Language:Python 99.8%Language:Shell 0.2%