FoodKG: A Tool to Enrich Knowledge Graphs Using Machine Learning Techniques

Run FoodKG with one command
FoodKG exists on Docker. To run our tool, just install docker on your machine: Docker then run the following command:
docker run -p 5000:5000 gharibim/foodkg
FoodKG will start on the localhost, port 5000: 127.0.0.1:5000
You can find a sample input file in Sample_Input folder
and sample context: http://example.com

To reproduce the results and build from scratch follow these steps:

Required libraries:

TensorFlow
Flask
NLTK
Werkzeug
Beautiful Soup
Requests
Install AGROVEC Embedding model from Google drive, unzip it then place in FoodKG/Prediction/AGROVEC/.
After that, download Apache Jena and place it in Apache Jena directory.
Finally run python3 FoodKG.py which is the main script that will start Flask server at localhost.

AGROVOC & AGROVEC
FoodKG will run and use our space vector AGROVEC by default. Our vector can be found in Prediction/AGROVEC/.
Moreover, if you would like to use Glove or any other vector instead of AGROVEC, then add the new vector in the same directory and change the name in prepare_Models.py. Get Glvoe from here
By default, the loaded words are 1000000, you can change the number in prepare_Models.py.

Relations Prediction
FoodKG uses Specialization Tensor Model (STM) to predict the relation between newly added triples. However, we re-trained STM model on AGROVOC triples dataset. FoodKG will use our pre-trained model Prediction/relations_prediction/args.output by default.

If you want to re-train the STM model by yourself, we provided the SPARQL queries that you will need to extract the instances from a dataset SPARQL_Queries. In our case, we used AGROVOC triples dataset, which get be found here. After extracting the instances using SPARQL, check STM Github page to prepare the training data for STM.

Evaluation
To reproduce the results, you can download the models and the evaluation dataset from Google Drive

References:
GEMSEC: Graph Embedding with Self Clustering
Specialization Tensor Model (STM)
Stanford Parser
Tensorflow
AGROVOC
GloVe: Global Vectors for Word Representation
Apache Jena

Acknowledgments: We would like the acknowledge the partial support of NSF Grant No. 1747751.

Publications:
Mohamed Gharibi, Arun Zachariah, Praveen Rao - FoodKG: A Software Tool to Enrich Knowledge Graphs on Food Datasets. In Frontiers in Big Data (Data Mining and Management), 19 pages, 2020. (in press)

Front. Big Data doi: 10.3389/fdata.2020.00012

pdsxsf / FoodKG

FoodKG: A Tool to Enrich Knowledge Graphs Using Machine Learning Techniques

About

Languages