graphdeeplearning / graphtransformer

Graph Transformer Architecture. Source code for "A Generalization of Transformer Networks to Graphs", DLG-AAAI'21.

Home Page:https://arxiv.org/abs/2012.09699

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Technical question

DevinKreuzer opened this issue · comments

Hi, thanks for the great paper :)

I was just curious as to what the 'z' variable is in line 59 of the graph_transformer_layer.py code? I cannot seem to find the equivalent in the paper. It seems you are normalizing the output heads by the sum of the attention weights?

Would appreciate a little point :)

Thanks,
Devin

Hi @DevinKreuzer, glad and thanks for your question.
We follow the DGL implementation with builtin funcs as described in detail here.

The 'z' is part of the softmax, implemented in this fashion.

Hope the article referenced makes clear and leaves no inconsistency as compared to the equations in the paper.

Cheers,
Vijay

Closing the issue!
Feel free to open in case of further questions.