How to get pre-trained embeddings for my own dataset with entities and relations

Question

How to get pre-trained embeddings for my own dataset with entities and relations

AndDoIt opened this issue 2 years ago · comments

Thanks for your excellent work on OOG link prediction. Could you please tell me how to get pre-trained embeddings for my own dataset with entities and relations?

Jinheon Baek · Answer 1 · Tue Sep 20 2022 05:43:50 GMT+0800 (China Standard Time)

Thank you for your interest, and sorry for replying late.

We used the DGL-KE library (https://github.com/awslabs/dgl-ke) to pre-train entities and relations in the knowledge graph.
AFAIK, you only need to provide your own datasets consisting of triples to this library, which would give you the trained embeddings for entities and relations.

AndDoIt · Answer 2 · Wed Sep 21 2022 17:40:05 GMT+0800 (China Standard Time)

Thanks very much for your reply! And I am sorry to have two more questions to bother you.
1、Since the supplementary material said you randomly sample the unseen entities with a relatively small amount of triplets, then divide the sampled unseen entities with associated triplets into meta-training/validation/test sets. I wonder if this means that the meta-training set is not entirely from the raw training set, while meta-validation/test sets are not entirely from raw validation/test sets respectively.
2、Since my own KG is smaller and sparser, could you please introduce more details with WN18RR or some related setting and training tips?

Jinheon Baek · Answer 3 · Fri Dec 16 2022 14:27:45 GMT+0800 (China Standard Time)

Thanks for your question, and sorry for getting back late.

That means, we first sample entities appearing less than x times, and then divide them into (meta-) training, validation, and test sets. You can simply think, in meta-learning, the training set is the same as the meta-training set.
Based on my own knowledge of KGs, it was a bit important to tune the margin hyperparameter in the triplet loss.

AndDoIt · Answer 4 · Fri Dec 16 2022 14:46:17 GMT+0800 (China Standard Time)

Thanks for your patient reply extremely, and I got it! And would you please introduce the steps to adjust the baseline models including Gmatching, FSRL and MetaR, so that they can be compared with GEN, since these baselines divide the dataset with sparse relations rather than entities. How to adjust them so that the four models have the same input, and they are comparable with each other?

Jinheon Baek · Answer 5 · Fri Dec 16 2022 14:56:44 GMT+0800 (China Standard Time)

Thanks for your follow-up questions!

We just simply used the model architectures of baselines with our own data splits. For example, in the seen to unseen category of Gmatching, the model predicts the unseen relation as implemented in the paper; meanwhile, in the ours category, we meta-train it in our framework to predict the unseen entity.

AndDoIt · Answer 6 · Fri Dec 16 2022 15:01:32 GMT+0800 (China Standard Time)

Thanks for your great work and your reply!!!