A distributed preconditioning optimizer which works with the 3d parallelism of DeepSpeed.
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool