Do the Attentions / MLPs run in parallel?
tensorpro opened this issue · comments
tensorpro commented
The attention implementation here looks like like it could be run as parallel attention.
But I was curious whether this implementation will result in them running in parallel?
Rafi Witten commented
Yes this is parallel attention. PaLM Section 2 (https://arxiv.org/pdf/2204.02311.pdf).
The fact that they can be overlapped tends to improve performance.