Support for T5
kishorenc opened this issue Β· comments
Do you have plans to support encoder-decoder models like T5? It will be great to have T5 with flash attention π
What specific model would you like supported? We would only take this on if we saw sufficient interest (but in practice we see heavy movement towards decoder-only models).
Decoder only models are great for generative use cases but T5 family is the work horse for many discriminative tasks. For example, the flan-t5-base model has 2M downloads on Huggingface in the last month. Support for flan-t5 will add a huge value to the community.
It'd be great to have T5 models here as well.
I'm going to try to turn MaxText into encoder-decoder anyway, so native support is of course also appreciated :)