Attention Matrix

Question

Attention Matrix

pbonazzi opened this issue 3 years ago · comments

Hi ! Congratulations for your paper and thank you for making the implementation publicly available as well.

Quick question on this function :

    def func(edges):
        return {out_field: (edges.src[src_field] * edges.dst[dst_field])}
    return func

Why do you do a multiplication of K and Q and not a dot product? The dimensions of the scores are [num_edges, num_heads, hidden_dim/num_heads]. But I expect a [num_edges,num_edges] matrix .

You can also reach me here : pietrobonazzi.edu@gmail.com
Hope to hear from you soon , Pietro Bonazzi

GaichaoLee · Answer 1 · Fri Mar 11 2022 15:44:12 GMT+0800 (China Standard Time)

Hi, I also don't understand why it's a Hadamard product of K and Q here? Do you get it clear now?

Vijay Prakash Dwivedi · Answer 2 · Fri Mar 11 2022 15:54:56 GMT+0800 (China Standard Time)

Hi @pbonazzi, @GaichaoLee,
After the element wise multiplication using the code snippet that you have quoted above, there is a sum which is applied across all the feature dimension (d=hidden_dim/num_heads) to get the final scalars. Effectively, it is a dot product.
The elementwise multiplication helps in to maintain a d dimensional edge feature that is used in the GraphTransformer with edge features layer.

Please refer to the detailed explanation in this issue #4

GaichaoLee · Answer 3 · Fri Mar 11 2022 16:09:24 GMT+0800 (China Standard Time)

Thanks for your reply! I debug the code again and know how you get the dot product now.