Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

Home Page:https://lightning.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FSDPPrecision should support 16-true with a loss scaler

zaptrem opened this issue · comments

Description & Motivation

if scaler is not None and self.precision != "16-mixed":

What if I want to use fp16 true, but with a loss scaler? This is closer to DeepSpeed's default settings. With FSDP, 16-true, no loss scaler my model doesn't converge. However, with FSDP, 16-true, and a loss scaler (commented out the assert and fixed the typo'ed return scaler instead of return none line) my model converges.

Pitch

No response

Alternatives

No response

Additional context

No response

cc @Borda

I came here to open this issue, and you already did.
I second this issue.

I fixed the package itself by adding

if scaler is not None and self.precision not in ["16-mixed", "16-true"]:
    raise ValueError(f"`precision={precision!r}` does not use a scaler, found {scaler}.")

but it has to be fixed naturally.