google / maxtext

A simple, performant and scalable Jax LLM!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for T5

kishorenc opened this issue Β· comments

Do you have plans to support encoder-decoder models like T5? It will be great to have T5 with flash attention πŸ˜ƒ

What specific model would you like supported? We would only take this on if we saw sufficient interest (but in practice we see heavy movement towards decoder-only models).

Decoder only models are great for generative use cases but T5 family is the work horse for many discriminative tasks. For example, the flan-t5-base model has 2M downloads on Huggingface in the last month. Support for flan-t5 will add a huge value to the community.

It'd be great to have T5 models here as well.

I'm going to try to turn MaxText into encoder-decoder anyway, so native support is of course also appreciated :)