mlfoundations / open_lm

A repository for research on medium sized language models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ablate on initialization

mitchellnw opened this issue · comments

interested in:

A) changing layer_id + 1 to args.num_layers.
B) removing the line std = std / math.sqrt(2 * (layer_id + 1))

related: #225

We tested #225 at 1B and it seems to hurt downstream evals significantly, unfortunately.