torchkge-team / torchkge

TorchKGE: Knowledge Graph embedding in Python and PyTorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What does the KnowledgeGraph do to build?

MikeDoes opened this issue · comments

from torchkge.data_structures import KnowledgeGraph
https://github.com/torchkge-team/torchkge/blob/master/torchkge/data_structures.py

I noticed that it takes a significant time to build. Have there been academic works that develop ways of implement graphs efficiently that are employed in TorchKGE?

As I understand, it creates a knowledge graph tensor based on the knowledge graph triplet list, is this correct?

Hello,

At the core of a torchkge.data_structures.KnowledgeGraph object, there are only three tensors that contain the information of the triplets (head_idx, tail_idx and relations). These a Long tensors containing only the index of entities and relations. What can take some time is to build the dictionaries containing the mapping between the entity or relation index and the labels contained in the dataframe given as input.

More precisely, building a torchkge.data_structures.KnowledgeGraph object from a Pandas dataframe takes three steps:

  • create two dictionaries ent2ix and rel2ix that map entity and relation labels contained in the dataframe to a numeral index
  • create the tensors containing the indexes of entities and relations using the previously built dictionaries (quite fast)
  • create two dictionaries called dict_of_heads and dict_of_tails that contain for a head-relation pairs (resp. tail-relation) the set of possible tails (resp. heads).

Building the three tensors is quite fast but creating the four dictionaries can take some time.