stanford-crfm / mistral

Mistral: A strong, northwesterly wind: Framework for transparent and accessible large-scale language model training, built with Hugging Face 🤗 Transformers.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make a loss regression test for Mistral

dlwh opened this issue · comments

We've now seen substantial regressions in loss over ~9 months ago. New dataloading is part of the problem, but not all. We need to make a test that we can regularly run that tracks changes in loss and gets upset at changes. See also stanford-crfm/levanter#35