graphdeeplearning / graphtransformer

Graph Transformer Architecture. Source code for "A Generalization of Transformer Networks to Graphs", DLG-AAAI'21.

Home Page:https://arxiv.org/abs/2012.09699

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why did you divide this term?

sperfu opened this issue · comments

Hi there,

I was reading your code on graphtransformer, I'm kind of curious on the operation shown below. Why did you divide the wV score by the w(or so called 'score' term), I didn't see any terms shown in your equation 4 or equation 9 in the paper. Could you illustrated that?

h_out = g.ndata['wV'] / (g.ndata['z'] + torch.full_like(g.ndata['z'], 1e-6)) # adding eps to all values here

Thanks

Hi @sperfu, it is part of the softmax term. Please refer to this issue for the pointers to the explanation.