mlfoundations / scaling

Language models scale reliably with over-training and on downstream tasks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Epochs

borgr opened this issue · comments

In the saved models on HF there is some indication of epochs.
However in the paper I don't see anywhere that you mentioned you make more than one epoch on the data.
How are epochs working for you? You have a different size of data for each of the pretraining datasets? or did you normalize them all to the same size? If not, is there somewhere a transition to indicate each time when did you make another epoch (relevant because you talk about overtraining, but I try to seprate it from phenomena like the datablations paperhttps://github.com/huggingface/datablations#models).

Hi @borgr, we don't do multiple passes over the datasets. Admittedly the use of the word epoch in our checkpointing logic is confusing. When developing the open_lm codebase we borrowed certain conventions from open_clip where number of epochs is synonymous with the number of checkpoints to save. We have a TODO to re-factor the naming, but since it is a breaking change we are waiting until after NeurIPS.

Sorry about the confusion and thanks for the question!