kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GPT-J: perplexity for checkpoints

danyaljj opened this issue · comments

Thanks for sharing the checkpoints!
Wondering if there is a plot of perplexity as a function of steps #?

Information collected during training (ppl, evals etc) can be seen here: https://wandb.ai/eleutherai/mesh-transformer-jax/reports/6B-Rotary--Vmlldzo2NDQxNzY