EMNLP 2018: Multi-Head Attention with Disagreement Regularization; NAACL 2019: Information Aggregation for Multi-Head Attention with Routing-by-Agreement
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
lsquser opened this issue 3 years ago · comments
请问在实现代码时,实现的是负的余弦相似度吗 代码里cos_diff = tf.reduce_mean(cos_diff, axis=[-2,-1]) + 1.0最后加的1是什么意思。 cos_diff是表示加了负号的吗,还是没加负号的。论文里面是添加了负号的呀