jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Minimum learning rate

artnoage opened this issue · comments

The minimum learning rate is the same as the "max". Is this intentional or a mistake? If yes, why (you can skip explanation if it is too bothersome)?

This is a mistake made by us and thanks a million for spotting that out!(by right it should be 4e-5)
Fortunately, we are still at the very early stage of training and there is still room to correct it. I have corrected this mistake, and the TinyLlama's lr curve will look like below:
image
I will update the readme pointing to this issue together with the release of the 500B-token checkpoint later.