FSDPPrecision should support 16-true with a loss scaler
zaptrem opened this issue · comments
Description & Motivation
What if I want to use fp16 true, but with a loss scaler? This is closer to DeepSpeed's default settings. With FSDP, 16-true, no loss scaler my model doesn't converge. However, with FSDP, 16-true, and a loss scaler (commented out the assert and fixed the typo'ed return scaler instead of return none line) my model converges.
Pitch
No response
Alternatives
No response
Additional context
No response
cc @Borda
I came here to open this issue, and you already did.
I second this issue.
I fixed the package itself by adding
if scaler is not None and self.precision not in ["16-mixed", "16-true"]:
raise ValueError(f"`precision={precision!r}` does not use a scaler, found {scaler}.")
but it has to be fixed naturally.