google / seqio

Task-based datasets, preprocessing, and evaluation for sequence models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to decide ideal mixture rates ?

StephennFernandes opened this issue · comments

what is the best way to decide on which mixture ratio is optimal?

In the mT5 paper the alpha value 0.3 gave the best balance between ideal performance for high and low resource languages.

However I am pretraining mT5 on Indian languages, and I have a diverse variety of indian multi-lingual corpus, where Hindi has 60M+ samples and Kashmiri has around 100k samples.

So I wanted to know if I could h-param tune somehow on t5x, or would just using alpha=0.3 work fine in my use case?

Hi @StephennFernandes, deciding mixture rates is a research problem, so this is not a straight-forward question to answer. I'd recommend doing an hparam search to arrive at a good set of mixing rates if possible (or surveying other papers to find acceptable rates).