How to keep constrains of sum(k)=1 and sum(α)=1?

Question

sunzewei2715 opened this issue 5 years ago · comments

In the original paper(weighted transformer), the author mentioned that "all bounds are respected during each training step by projection."

I have no idea what "by project" means and don't know how to keep the constrains of sum(k)=1 and sum(α)=1.

It seems there is no particular processing in this repository except for initialization. Could you please explain?