What does the KnowledgeGraph do to build?
MikeDoes opened this issue · comments
from torchkge.data_structures import KnowledgeGraph
https://github.com/torchkge-team/torchkge/blob/master/torchkge/data_structures.py
I noticed that it takes a significant time to build. Have there been academic works that develop ways of implement graphs efficiently that are employed in TorchKGE?
As I understand, it creates a knowledge graph tensor based on the knowledge graph triplet list, is this correct?
Hello,
At the core of a torchkge.data_structures.KnowledgeGraph
object, there are only three tensors that contain the information of the triplets (head_idx
, tail_idx
and relations
). These a Long
tensors containing only the index of entities and relations. What can take some time is to build the dictionaries containing the mapping between the entity or relation index and the labels contained in the dataframe given as input.
More precisely, building a torchkge.data_structures.KnowledgeGraph
object from a Pandas dataframe takes three steps:
- create two dictionaries
ent2ix
andrel2ix
that map entity and relation labels contained in the dataframe to a numeral index - create the tensors containing the indexes of entities and relations using the previously built dictionaries (quite fast)
- create two dictionaries called
dict_of_heads
anddict_of_tails
that contain for a head-relation pairs (resp. tail-relation) the set of possible tails (resp. heads).
Building the three tensors is quite fast but creating the four dictionaries can take some time.