honghanhh / oeis_similarity

Premise Selection using OEIS portal

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Premise Selection

This repository is an extended version from the paper IJS at TextGraphs-16 Natural Language Premise Selection Task: Will Contextual Information Improve Natural Language Premise Selection?.

Datasets

We crawled 3,600,000 statements from The On-Line Encyclopedia of Integer Sequences. For each statements (based on the query in form of Axxxxxx where x is the integer). Check out the sample of the output in ./data.

Preresequisite

Install the necessary libaries by the following command:

pip install -r requirements.txt

Implemetation

To re-crawl, please run the following command:

python crawling.py --output [output_dir]

Or check the notebook ./notebook/test_crawling.ipynb.

To preprocess the crawled data into format of training set and the knowledge base, run the following command:

python preprocess.py --input [input_dir] --output [output_dir]

To train the similarity model, run the following command:

./model.py --knowledge_base data/knowledge_base.json --statements data/statements.json > predict.txt

To evaluate the performance, run the following command:

./evaluate.py --gold_standard data/statements.json predict.txt

If you are too lazy to run all the above command separately, run the following command and waiting for the result =)).

chmod +x user run.sh
./run.sh

Reference

Tran, Thi Hong Hanh, et al. "IJS at TextGraphs-16 Natural Language Premise Selection Task: Will Contextual Information Improve Natural Language Premise Selection?." Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing. 2022.

Contributor

About

Premise Selection using OEIS portal

License:GNU General Public License v3.0


Languages

Language:Python 62.2%Language:Shell 37.8%