Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

Home Page:https://lightning.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

why pytorch-lightning doc say "Model-parallel training (FSDP and DeepSpeed)". I think there is something wrong.

HaoyaWHL opened this issue Β· comments

πŸ“š Documentation

https://lightning.ai/docs/pytorch/stable/advanced/model_init.html in this doc, PL say FSDP after "model-parallel training".

but we all know FSDP is a data parallel method.
just like in https://huggingface.co/docs/transformers/fsdp, it says FSDP is data parallel.
so I think there maybe sth wrong

cc @Borda

FSDP is both model-parallel and data-parallel. Each GPU only sees a chunk of the model at a time as well as seeing only a chunk of the data at a time. This is correct

Oh, thanks for reply. I got it