malllabiisc / CompGCN

ICLR 2020: Composition-Based Multi-Relational Graph Convolutional Networks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About why you pre-calculate all node representations

Punchwes opened this issue · comments

Hi @svjan5 ,
Thanks very much for your amazing work, it's very interesting.

I am a little bit curious about why you pre-calculate all related node representations. In your code, in the block of message passing and propagate() you seem to calculate every related nodes across the whole dataset before seeing them all since you are using a mini-batch strategy. I am not sure if I am understanding it correctly?

It might not make sense to me since that there would definitely exist unseen entities which is beyond your mini-batch instances. But your aggregation seems to include all possible (sub, obj) pairs across the whole dataset by using self.edge_indexs. Then the backpropagation will automatically update parameters for unseen obj entities.
For instance:
if your mini-batch contains:

(ent1, relation1, ent2), (ent1, relation2, ent3), (ent4, relation3, ent5)

, and in the whole dataset, there has one more ent1 related pairs:

(ent1, relation4, ent6).

In your current strategy, you aggregate (ent6+relation4, ent2+relation1, ent3+relation2) but you actually haven't seen ent6 in your mini-batch. It might not make sense to update the embedding for ent6?

Would you please give me some more details about why you do it like this?

Yes, we compute the CompGCN representation for the entire graph before computing link prediction scores. When we apply k layers of any GCN model then each node's embedding in influenced by k-hop neighbors. Thus, the embedding of ent1 will be dependent on ent6's thus its embedding should also be updated. What else should be done in this case?

Oh, I see. It makes sense when the whole dataset is regarded as one graph. Sorry I am new to this task and not realised these premise.

Hi, in this way, I find that CUDA is very easy to be out of memory when I encode a large knowledge graph like Freebase. Is there any method to not load the whole knowledge graph before mini-batch training? I think this will make the repo more easily be applied on large-scale knowledge graphs. Thanks!