why pytorch-lightning doc say "Model-parallel training (FSDP and DeepSpeed)". I think there is something wrong.

Question

why pytorch-lightning doc say "Model-parallel training (FSDP and DeepSpeed)". I think there is something wrong.

HaoyaWHL opened this issue 3 months ago · comments

📚 Documentation

https://lightning.ai/docs/pytorch/stable/advanced/model_init.html in this doc, PL say FSDP after "model-parallel training".

but we all know FSDP is a data parallel method.
just like in https://huggingface.co/docs/transformers/fsdp, it says FSDP is data parallel.
so I think there maybe sth wrong

cc @Borda

Brian French · Answer 1 · Tue Apr 30 2024 21:49:47 GMT+0800 (China Standard Time)

FSDP is both model-parallel and data-parallel. Each GPU only sees a chunk of the model at a time as well as seeing only a chunk of the data at a time. This is correct

lijunran · Answer 2 · Mon May 06 2024 08:55:34 GMT+0800 (China Standard Time)

Oh, thanks for reply. I got it