How to keep constrains of sum(k)=1 and sum(α)=1?
sunzewei2715 opened this issue · comments
sunzewei2715 commented
In the original paper(weighted transformer), the author mentioned that "all bounds are respected during each training step by projection."
I have no idea what "by project" means and don't know how to keep the constrains of sum(k)=1 and sum(α)=1.
It seems there is no particular processing in this repository except for initialization. Could you please explain?