Technical question

Question

Technical question

DevinKreuzer opened this issue 4 years ago · comments

Hi, thanks for the great paper :)

I was just curious as to what the 'z' variable is in line 59 of the graph_transformer_layer.py code? I cannot seem to find the equivalent in the paper. It seems you are normalizing the output heads by the sum of the attention weights?

Would appreciate a little point :)

Thanks,
Devin

Vijay Prakash Dwivedi · Answer 1 · Thu Jan 14 2021 22:52:39 GMT+0800 (China Standard Time)

Hi @DevinKreuzer, glad and thanks for your question.
We follow the DGL implementation with builtin funcs as described in detail here.

The 'z' is part of the softmax, implemented in this fashion.

Hope the article referenced makes clear and leaves no inconsistency as compared to the equations in the paper.

Cheers,
Vijay

Vijay Prakash Dwivedi · Answer 2 · Thu Jan 21 2021 16:27:45 GMT+0800 (China Standard Time)

Closing the issue!
Feel free to open in case of further questions.