Do the Attentions / MLPs run in parallel?

Question

tensorpro opened this issue a year ago · comments

The attention implementation here looks like like it could be run as parallel attention.

But I was curious whether this implementation will result in them running in parallel?

Rafi Witten · Answer 1 · Tue Nov 07 2023 13:40:47 GMT+0800 (China Standard Time)

Yes this is parallel attention. PaLM Section 2 (https://arxiv.org/pdf/2204.02311.pdf).

The fact that they can be overlapped tends to improve performance.