sisinflab / Augmented-and-Linked-Open-Datasets-for-Recommendation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GitHub repo size GitHub

Main page - GitHub Repository

Knowledge Graph Datasets for Recommendation

This is the official repository of the paper Knowledge Graph Datasets for Recommendation accepted for publication at KaRS@RecSys2023.

This work covers the enrichment of two widely used recommendation datasets from the movie and book domain, MovieLens 25M and LibraryThing respectively. Specifically:

  • we link the items in the LibraryThing (LT) and MovieLens 25M (ML25M) datasets with the entities available in three well-known knowledge graphs: Wikidata, DBpedia, and Freebase;
  • starting from item entity linking, we explore the Wikidata and DBpedia knowledge graphs connections up to two hops to collect all the structured information connected to these resources, thus providing persistent and ready-to-use enriched datasets for performing reproducible experiments.

Inspired by the advances in the knowledge graph, Graph Convolutional Networks, Link Prediction, and Recommender Systems research, these augmented datasets aim to meet their cutting-edge research needs. Moreover, these datasets pave the way for further research to investigate different recommendation modalities simultaneously.

Download the Datasets

All the resources are available here.

Please note that the resources cannot be hosted on GitHub due to GitHub size limits.

Resources

Our resources collect:

  • links from Item IDs to URI resources on Wikidata, DBpedia and Freebase KGs for both movies and books
  • RDF-triples from the Wikidata and DBpedia KGs for both movies and books

The files are split into zip archives as follows:

MovieLens 25M
├── ml25m_linking.zip
│   ├── ml25m_linking.tsv   
├── ml25m_subgraphs.zip
│   └── ml25m_wikidata_1hop.tsv
│   └── ml25m_wikidata_2hop.tsv
│   └── ml25m_dbpedia_1hop.tsv
│   └── ml25m_dbpedia_2hop.tsv


LibraryThing
├── lt_linking.zip
│   ├── lt_linking.tsv   
│   ├── lt_wikidata_freebase_linking.tsv   
│   ├── lt_dbpedia_freebase_linking.tsv   
├── lt_subgraphs.zip
│   └── lt_wikidata_1hop.tsv
│   └── lt_wikidata_2hop.tsv
│   └── lt_dbpedia_1hop.tsv
│   └── lt_dbpedia_2hop.tsv

Resources Description

Here we provide a description of the contents of our collection.

File Name Descriptions
MovieLens 25M
ml25m_linking.tsv This file contains the link of items in the MovieLens 25M dataset to Wikidata, DBpedia, and FreeBase Knowledge Graphs. This is a tab separated file containing the following fields:
  • movie_id : the movie identifier in the MovieLens 25M dataset
  • wikidata_uri : the uri resource on Wikidata associated to the movie
  • dbpedia_uri : the uri resource on DBpedia associated to the movie
  • freebase_uri : the uri resource on FreeBase associated to the movie
ml25m_wikidata_1hop.tsv This file contains the RDF triples gathered exploring the Wikidata Knowledge Graph up tp 1-hop starting from the uri resources found in the item linking phase concerning the MovieLens 25M dataset. This is a tab separated file containing the following fields:
  • subject : the uri resource of the subject in the RDF triple at 1-hop
  • predicate : the uri resource of the predicate in the RDF triple at 1-hop
  • object : the uri resource of the object in the RDF triple at 1-hop
ml25m_wikidata_2hop.tsv This file contains the RDF triples gathered exploring the Wikidata Knowledge Graph up tp 2-hop starting from the uri resources objects found in the exploration up to 1-hop concerning the MovieLens 25M dataset. This is a tab separated file containing the following fields:
  • subject : the uri resource of the subject in the RDF triple at 2-hop (i.e., the object at the 1-hop exploration)
  • predicate : the uri resource of the predicate in the RDF triple at 2-hop
  • object : the uri resource of the object in the RDF triple at 2-hop
ml25m_dbpedia_1hop.tsv This file contains the RDF triples gathered exploring the DBpedia Knowledge Graph up tp 1-hop starting from the uri resources found in the item linking phase concerning the MovieLens 25M dataset. This is a tab separated file containing the following fields:
  • subject : the uri resource of the subject in the RDF triple at 1-hop
  • predicate : the uri resource of the predicate in the RDF triple at 1-hop
  • object : the uri resource of the object in the RDF triple at 1-hop
ml25m_dbpedia_2hop.tsv This file contains the RDF triples gathered exploring the DBpedia Knowledge Graph up tp 2-hop starting from the uri resources objects found in the exploration up to 1-hop concerning the MovieLens 25M dataset. This is a tab separated file containing the following fields:
  • subject : the uri resource of the subject in the RDF triple at 2-hop (i.e., the object at the 1-hop exploration)
  • predicate : the uri resource of the predicate in the RDF triple at 2-hop
  • object : the uri resource of the object in the RDF triple at 2-hop
LibraryThing
lt_linking.tsv This file contains the link of items in the LibraryThing dataset to Wikidata and DBpedia Knowledge Graphs. This is a tab separated file containing the following fields:
  • work_id : the book identifier in the LibraryThing dataset
  • wikidata_uri : the uri resource on Wikidata associated to the book
  • wikidata_similarity : the similarity between dataset and Wikidata side information value
  • dbpedia_uri : the uri resource on DBpedia associated to the book
  • dbpedia_similarity : the similarity between dataset and DBpedia side information value
lt_wikidata_freebase_linking.tsv This file contains the link of items in the LibraryThing dataset to FreeBase Knowledge Graph from Wikidata. This is a tab separated file containing the following fields:
  • work_id : the book identifier in the LibraryThing dataset
  • wikidata_uri : the uri resource on Wikidata associated to the book
  • wikidata_similarity : the similarity between dataset and Wikidata side information value
  • freebase_uri : the uri resource on FreeBase associated to the book from the Wikidata uri
lt_dbpedia_freebase_linking.tsv This file contains the link of items in the LibraryThing dataset to FreeBase Knowledge Graph from DBpedia. This is a tab separated file containing the following fields:
  • work_id : the book identifier in the LibraryThing dataset
  • dbpedia_uri : the uri resource on DBpedia associated to the book
  • dbpedia_similarity : the similarity between dataset and DBpedia side information value
  • freebase_uri : the uri resource on FreeBase associated to the book from the DBpedia uri
lt_wikidata_1hop.tsv This file contains the RDF triples gathered exploring the Wikidata Knowledge Graph up tp 1-hop starting from the uri resources found in the item linking phase concerning the LibraryThing dataset. This is a tab separated file containing the following fields:
  • subject : the uri resource of the subject in the RDF triple at 1-hop
  • predicate : the uri resource of the predicate in the RDF triple at 1-hop
  • object : the uri resource of the object in the RDF triple at 1-hop
lt_wikidata_2hop.tsv This file contains the RDF triples gathered exploring the Wikidata Knowledge Graph up tp 2-hop starting from the uri resources objects found in the exploration up to 1-hop concerning the LibraryThing dataset. This is a tab separated file containing the following fields:
  • subject : the uri resource of the subject in the RDF triple at 2-hop (i.e., the object at the 1-hop exploration)
  • predicate : the uri resource of the predicate in the RDF triple at 2-hop
  • object : the uri resource of the object in the RDF triple at 2-hop
lt_dbpedia_1hop.tsv This file contains the RDF triples gathered exploring the DBpedia Knowledge Graph up tp 1-hop starting from the uri resources found in the item linking phase concerning the LibraryThing dataset. This is a tab separated file containing the following fields:
  • subject : the uri resource of the subject in the RDF triple at 1-hop
  • predicate : the uri resource of the predicate in the RDF triple at 1-hop
  • object : the uri resource of the object in the RDF triple at 1-hop
lt_dbpedia_2hop.tsv This file contains the RDF triples gathered exploring the DBpedia Knowledge Graph up tp 2-hop starting from the uri resources objects found in the exploration up to 1-hop concerning the LibraryThing dataset. This is a tab separated file containing the following fields:
  • subject : the uri resource of the subject in the RDF triple at 2-hop (i.e., the object at the 1-hop exploration)
  • predicate : the uri resource of the predicate in the RDF triple at 2-hop
  • object : the uri resource of the object in the RDF triple at 2-hop

Resources Statistics

The table below shows the statistics of the collected resource categorized by dataset and data source.

drawing

Contributing

We welcome any contribution that could improve our datasets. Please contact us by email.

The Team

This work was developed by

* Corresponding authors

License

This work is released under APACHE2 License.

Acknowledgements

Our datasets are constructed thanks to

About

License:Apache License 2.0


Languages

Language:Python 100.0%