bigscience-workshop / bigscience

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

Sharing the 1.3B-Pile@300B model

BlinkDL opened this issue 2 years ago · comments

PENG Bo commented 2 years ago

The 1.3B-Pile@300B model is quite strong:
https://docs.google.com/spreadsheets/d/1CI8Q9RCblLRzUOPJ6ViqBmo284-8ojluQ-CmaEuhuv0/edit#gid=1295801165

lambada 0.6088 piqa 0.7160 hellaswag 0.5209 --> these are all better than gpt-neo 1.3B.

Could you share the model? Thank you.