bigscience-workshop / bigscience

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What is the number of epochs of the final training?

cmsflash opened this issue · comments

The config file lists the sample count of the dataset as 220M and a global batch size of 2048, which equates to ~107K steps per epoch. The main README says the total number of training steps is 95K, which means epoch 1 is not finished. However, the training chronicles suggest more than one epochs of training.

What is the number of epoch for the final training and what am I missing?