graphdeeplearning / graphtransformer

Graph Transformer Architecture. Source code for "A Generalization of Transformer Networks to Graphs", DLG-AAAI'21.

Home Page:https://arxiv.org/abs/2012.09699

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Attention Matrix

pbonazzi opened this issue · comments

Hi ! Congratulations for your paper and thank you for making the implementation publicly available as well.

Quick question on this function :

    def func(edges):
        return {out_field: (edges.src[src_field] * edges.dst[dst_field])}
    return func

Why do you do a multiplication of K and Q and not a dot product? The dimensions of the scores are [num_edges, num_heads, hidden_dim/num_heads]. But I expect a [num_edges,num_edges] matrix .

You can also reach me here : pietrobonazzi.edu@gmail.com
Hope to hear from you soon , Pietro Bonazzi

Hi, I also don't understand why it's a Hadamard product of K and Q here? Do you get it clear now?

Hi @pbonazzi, @GaichaoLee,
After the element wise multiplication using the code snippet that you have quoted above, there is a sum which is applied across all the feature dimension (d=hidden_dim/num_heads) to get the final scalars. Effectively, it is a dot product.
The elementwise multiplication helps in to maintain a d dimensional edge feature that is used in the GraphTransformer with edge features layer.

Please refer to the detailed explanation in this issue #4

Thanks for your reply! I debug the code again and know how you get the dot product now.