EMNLP 2018: Multi-Head Attention with Disagreement Regularization; NAACL 2019: Information Aggregation for Multi-Head Attention with Routing-by-Agreement
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
jxxiao opened this issue 3 years ago · comments
cos_diff_square完全没有参与计算