Average of the context vector in lecture "Contextual Word Representation"
lasp73 opened this issue · comments
Thank you for the great course! The course lectures and the other materials are really valuable to learn more about NLU.
I am not an enrolled student, but I've decided ask here a minor question related to the first lecture about "Contextual Word Representation".
In slide 5 (https://web.stanford.edu/class/cs224u/slides/cs224u-contextualreps-part1-handout.pdf), the "context vector" is evaluated as
My question: Is it really necessary to do the "mean" operation instead of a "sum" ?
The attention weights
What I see often is to scale the dot products (before the softmax)
Thanks again!
Ok, the very next lecture talks about the changes above. So, I am closing the issue.