why pytorch-lightning doc say "Model-parallel training (FSDP and DeepSpeed)". I think there is something wrong.
HaoyaWHL opened this issue Β· comments
lijunran commented
π Documentation
https://lightning.ai/docs/pytorch/stable/advanced/model_init.html in this doc, PL say FSDP after "model-parallel training".
but we all know FSDP is a data parallel method.
just like in https://huggingface.co/docs/transformers/fsdp, it says FSDP is data parallel.
so I think there maybe sth wrong
cc @Borda
Brian French commented
FSDP is both model-parallel and data-parallel. Each GPU only sees a chunk of the model at a time as well as seeing only a chunk of the data at a time. This is correct
lijunran commented
Oh, thanks for reply. I got it