torchkge-team / torchkge

TorchKGE: Knowledge Graph embedding in Python and PyTorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KnowledgeGraph embedding size problem in HoLE

zozo170610 opened this issue · comments

  • TorchKGE version: 0.17.5
  • Python version: Python 3.8.10
  • Operating System: colab

I'm using the Hole you made.
probelm occured while scoring

Error =
/usr/local/lib/python3.8/dist-packages/torchkge/models/bilinear.py in inference_scoring_function(self, h, t, r)
375 # this is the tail completion case in link prediction
376 h = h.view(b_size, 1, self.emb_dim)
--> 377 hr = matmul(h, r).view(b_size, self.emb_dim, 1)
378 return (hr * t.transpose(1, 2)).sum(dim=1)
379 elif len(h.shape) == 3:

RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [64, 500] but got: [64, 20387].
64 is Lot size and 500 is embedding size, and 20387 is a number of candidates.

so i check the size of candi, heda, tail, relaiton
candi = torch.Size([64, 500, 500]) b_size, rel_emb_dim, n_ent, dtype: torch.float
head = torch.Size([64, 500]) b_size, rel_emb_dim, dtype: torch.float
tail = torch.Size([64, 500]) b_size, rel_emb_dim, dtype: torch.float
relation = torch.Size([64, 20387, 500]) b_size, rel_emb_dim

candi's & relation size is weired

so i try to evaluation on fb15k data using your tutorior code. and i check the size of candi, heda, tail, relaiton.
this is a rsult
candi = torch.Size([1, 14951, 100]) b_size, rel_emb_dim, n_ent, dtype: torch.float
head = torch.Size([1, 100]) b_size, rel_emb_dim, dtype: torch.float
tail = torch.Size([1, 100]) b_size, rel_emb_dim, dtype: torch.float
relation = torch.Size([1, 100]) b_size, rel_emb_di

I didn't find any problem with my input data generation.
I maded this code to generate input data.
I tried both pandas and kg methods.
raw data is a txt file composed of tab intervals in the order of head, relation, and tail.
but didn't work

`def load_data(file_path, name_entity_data, name_relation_data, name_train_data, name_valid_data, name_test_data, name_all_data, name_AUC_data):

# file_path = '/content/drive/MyDrive/

print("load data from {}".format(file_path))

with open(os.path.join(file_path, name_entity_data)) as f:
    entity2id = dict()
    id2entity = dict() #

    for line in f:
        eid, entity = line.strip().split('\t')
        entity2id[entity] = int(eid)
        id2entity[eid] = entity #

with open(os.path.join(file_path, name_relation_data)) as f:
    relation2id = dict()
    id2relation = dict() #

    for line in f:
        rid, relation = line.strip().split('\t')
        relation2id[relation] = int(rid)
        id2relation[rid] = relation #

kg_train = read_triplets_to_kg(os.path.join(file_path, name_train_data), entity2id, relation2id)
kg_valid = read_triplets_to_kg(os.path.join(file_path, name_valid_data), entity2id, relation2id)
kg_test = read_triplets_to_kg(os.path.join(file_path, name_test_data), entity2id, relation2id)
kg_all = read_triplets_to_kg(os.path.join(file_path, name_all_data), entity2id, relation2id)
kg_auc = read_triplets_to_kg(os.path.join(file_path, name_AUC_data), entity2id, relation2id)

print('num_entity: {}'.format(len(entity2id)))
print('num_relation: {}'.format(len(relation2id)))
print('num_kg_train: {}'.format(len(kg_train['heads'])))
print('num_kg_valid: {}'.format(len(kg_valid['heads'])))
print('num_kg_test: {}'.format(len(kg_test['heads'])))

return entity2id, relation2id, id2entity, id2relation, kg_train, kg_valid, kg_test, kg_all, kg_auc`

`def read_triplets_to_kg(file_path, entity2id, relation2id):
heads = []
tails = []
relations = []
kg = dict()

with open(file_path) as f:
    for line in f:
        head, relation, tail = line.strip().split('\t')
        heads.append(entity2id[head])
        tails.append(entity2id[tail])
        relations.append(relation2id[relation.strip()])

kg['heads'] = torch.LongTensor(heads)
kg['tails'] = torch.LongTensor(tails)
kg['relations'] = torch.LongTensor(relations)

return kg`

and middle of model train code,

` entity2id, relation2id, id2entity, id2relation, kg_train, kg_valid, kg_test, kg_all, kg_auc = load_data(file_path, name_entity_data, name_relation_data, train, valid, test, all, auc)

kg_train = KnowledgeGraph(kg= kg_train, ent2ix = entity2id, rel2ix = relation2id)
kg_valid = KnowledgeGraph(kg=kg_valid, ent2ix = entity2id, rel2ix = relation2id)
kg_test = KnowledgeGraph(kg=kg_test, ent2ix = entity2id, rel2ix = relation2id)
kg_auc = KnowledgeGraph(kg=kg_auc, ent2ix = entity2id, rel2ix = relation2id) `

Hello, thanks for raising this issue. It should be fixed with the PR#249. Could you confirm that it works before the patch is release ?

Thank you very much.
The same problem no longer occurs when using HoLE.