THUDM / GCC

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training @ KDD 2020

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about pretraining subgraphs

Kqiii opened this issue · comments

commented

Hi,

May I ask some questions about the pretraining subgraphs?

  1. Why do you apply a (** 0.75 ) operation to the individual node degrees? What is the benefit of this?

    degrees = torch.cat([g.in_degrees().double() ** 0.75 for g in self.graphs])

  2. Here the "replace" option is set to True.

    self.length, size=self.num_samples, replace=True, p=prob.numpy()

    I believe it is likely that some nodes would be sampled twice or more times, which might harm the contrastive training process. For example, if node v is sampled twice, then it would have two query-key pairs: (g_1, g_2) and (g_3, g_4) for v. In contrastive training (g_1, g_2) is regarded as a positive sample, while (g_1, g_3) is considered as negative, though all the four subgraphs, i.e., g_1 to g_4 are sampled from the ego-graph of node v. Would it be better if set this option to False? Or did I misunderstand anything about the contrastive training process?

  3. Why there is a max(self.rw_hops, ....) operation? What is the disadvantage of just using the preset self.rw_hops for each of the nodes? Moreover, why is there also a (** 0.75) operation?

    max_nodes_per_seed = max(

Thank you very much!