google / maxtext

A simple, performant and scalable Jax LLM!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Do the Attentions / MLPs run in parallel?

tensorpro opened this issue · comments

The attention implementation here looks like like it could be run as parallel attention.

But I was curious whether this implementation will result in them running in parallel?

Yes this is parallel attention. PaLM Section 2 (https://arxiv.org/pdf/2204.02311.pdf).

The fact that they can be overlapped tends to improve performance.