mlfoundations / open_lm

A repository for research on medium sized language models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

grad accum tests failing on gpu w/ amp_bf16 precision

sagadre opened this issue · comments

changing precision from fp32 to amp_bf16 leads to pytest tests/test_grad_accum.py failing

FAILED tests/test_grad_accum.py::test_grad_acc - AssertionError: Failed gradient checks at: ['tok_embeddings.weight', 'layers.0.attention.in_proj.weight', 'layers.0...
FAILED tests/test_grad_accum.py::test_grad_acc_fsdp - torch.multiprocessing.spawn.ProcessRaisedException: