EleutherAI / oslo

OSLO: Open Source for Large-scale Optimization

Home Page:https://oslo.eleuther.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FSDP returns different loss value with zero stage 2 and 3

dongsungkim opened this issue · comments

How to reproduce

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nnodes=1 --nproc_per_node=2  ./tests/torch/nn/parallel/data_parallel/test_fsdp.py --zero-stage 2

Environment

  • OS : ubuntu18.04
  • Python version : python3.7
  • Transformers version : 4.21.2
  • Whether to use Docker:
  • Misc.:

No optimiser implementation in oslo/torch/nn/parallel/data_parallel/data_parallel.py.
It will be added for zero-stage 2 and 3.

In addition to that, Need to check cpu_offload in FSDP code.