Questions about pretraining subgraphs
Kqiii opened this issue · comments
Hi,
May I ask some questions about the pretraining subgraphs?
-
Why do you apply a (** 0.75 ) operation to the individual node degrees? What is the benefit of this?
GCC/gcc/datasets/graph_dataset.py
Line 86 in 20398aa
-
Here the "replace" option is set to True.
GCC/gcc/datasets/graph_dataset.py
Line 89 in 20398aa
I believe it is likely that some nodes would be sampled twice or more times, which might harm the contrastive training process. For example, if node v is sampled twice, then it would have two query-key pairs: (g_1, g_2) and (g_3, g_4) for v. In contrastive training (g_1, g_2) is regarded as a positive sample, while (g_1, g_3) is considered as negative, though all the four subgraphs, i.e., g_1 to g_4 are sampled from the ego-graph of node v. Would it be better if set this option to False? Or did I misunderstand anything about the contrastive training process? -
Why there is a max(self.rw_hops, ....) operation? What is the disadvantage of just using the preset self.rw_hops for each of the nodes? Moreover, why is there also a (** 0.75) operation?
GCC/gcc/datasets/graph_dataset.py
Line 113 in 20398aa
Thank you very much!