datquocnguyen / STransE

STransE: a novel embedding model of entities and relationships in knowledge bases (NAACL 2016)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

STransE: a novel embedding model of entities and relationships in knowledge bases

This STransE program provides the implementation of the embedding model STransE for knowledge base completion, as described in my NAACL-HLT 2016 paper:

Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu and Mark Johnson. 2016. STransE: a novel embedding model of entities and relationships in knowledge bases. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2016, pp. 460-466. [.bib]

Please cite my NAACL-HLT 2016 paper whenever STransE is used to produce published results or incorporated into other software.

The program also provides the implementation of the embedding model TransE. See an overview of embedding models of entities and relationships for knowledge base completion at HERE.

I would highly appreciate to have your bug reports, comments and suggestions about STransE. As a free open-source implementation, STransE is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

Usage

Compile the program

Suppose that g++ is already set to run in command line or terminal. After you clone or download (and then unzip) the program, you have to compile the program by executing:

SOURCE_DIR$ g++ -I ../SOURCE_DIR/ STransE.cpp -o STransE -O2 -fopenmp -lpthread

Note that the actual command starts from g++. Here SOURCE_DIR is simply used to denote the source code directory. Examples:

STransE$ g++ -I ../STransE/ STransE.cpp -o STransE -O2 -fopenmp -lpthread

STransE-master$ g++ -I ../STransE-master/ STransE.cpp -o STransE -O2 -fopenmp -lpthread

Run the program

To run the program, we perform:

$./STransE -model 1_OR_0 -data CORPUS_DIR_PATH -size <int> -l1 1_OR_0 -margin <double> -lrate <double> [-init 1_OR_0] [-nepoch <int>] [-evalStep <int>] [-nthreads <int>]

//For Windows OS: use ./STransE.exe instead of ./STransE

where hyper-parameters in [ ] are optional!

Required parameters:

-model: Specify the embedding model STransE or TransE. It gets value 1 or 0, where 1 denotes STransE while 0 denotes TransE.

-data: Specify path to the dataset directory. Find the dataset format instructions in the Datasets folder inside the source code directory.

-size: Specify the number of vector dimensions.

-l1: Specify the L1 or L2 norm. It gets value 1 or 0, where 1 denotes L1-norm while 0 denotes L2-norm.

-margin: Specify the margin hyper-parameter.

-lrate: Specify the SGD learning rate.

Optional parameters:

-init: Use when -model gets value 1 (i.e. for STransE). It gets value 1 or 0 in which the default value is 1. The value 1 means that the entity and relation vectors are initialized from external files (e.g. entity2vec.init and relation2vec.init in the Datasets folder inside the source code directory), while the value 0 means that the entity and relation vectors are randomly initialized.

-nepoch: Specify the number of training epochs. The default value is 2000.

-evalStep: Specify a step to save and evaluate the model, e.g., evaluating the model after each step of 500 training epochs. The default value is 2000.

-nthreads: Specify the number of multiple threads used for evaluation. The default value is 1. Note that evaluating link/entity prediction in knowledge bases is slow. If you can afford to run the program with many threads, the evaluation process will be much faster, thus you can even evaluate the model after each training epoch.

Evaluation metrics

For evaluating link/entity prediction, the program provides ranking-based scores as evaluation metrics, including the mean rank, the mean reciprocal rank, Hits@1, Hits@5 and Hits@10 in two setting protocols "Raw" and "Filtered".

Reproduce the STransE results

To reproduce the STransE results published in my NAACL-HLT 2016 paper, execute:

$ ./STransE -model 1 -data Datasets/WN18/ -size 50 -margin 5 -l1 1 -lrate 0.0005

$ ./STransE -model 1 -data Datasets/FB15k/ -size 100 -margin 1 -l1 1 -lrate 0.0001

About

STransE: a novel embedding model of entities and relationships in knowledge bases (NAACL 2016)

License:Other


Languages

Language:C++ 98.4%Language:C 1.4%Language:CMake 0.2%