About label-agreement
DuanhaoranCC opened this issue · comments
1: and
are different expressions that calculate the same attention. But
, why does sigma operate on DP or not GO.
2:When you answered question 1 in the experiment section, you cited (link), I'd like to ask you how to calculate the last picture of entropy visualization (link), Notes in the text: one can see that the attention values learned is quite similar to uniform distribution (i.e, all neighbors are equally important). Does it count the node entropy of a certain layer? Isn't the uniform distribution a straight line? It doesn't seem to be evenly distributed. What is their connection? This question may help me understand the significance of introducing uniform distribution in the illustrations of your paper figure 2. It is said in the figure that the original gat captures the label consistency better than the DP mode. Why not consider adding SD and MX for comparison, or does this part just show that go is better than DP.
It takes up your time. Thank you very much for your answer!