Add PL Lightning to Enable Distributed Training and Deep Speed

Question

Add PL Lightning to Enable Distributed Training and Deep Speed

aribornstein opened this issue 3 years ago · comments

This work is awesome I see that PL is already being used for some of the DataModules it would be awesome to see Lightning Module Integration to make training more robust. https://pytorch-lightning.readthedocs.io/en/stable/starter/converting.html

Eric Alcaide · Answer 1 · Mon Apr 05 2021 16:30:15 GMT+0800 (China Standard Time)

Hi there! It is always nice to find new people stopping by... We're glad you're finding it interesting!

So yes, we plan to use pytorch lightning to write the dataloaders and training scripts in principle... will keep you posted!

Phil Wang · Answer 2 · Tue Apr 06 2021 05:35:49 GMT+0800 (China Standard Time)

@aribornstein hello! yes, we are leaning pytorch-lightning, as long as it can support a use-case of ours - we need to curriculum learn the folding starting from short sequences -> longer ones

otherwise, we will also highly consider deepspeed!

PythicCoder · Answer 3 · Sun Apr 11 2021 15:03:46 GMT+0800 (China Standard Time)

Thats awesome I believe PyTorch Lightning should support that also DeepSpeed is fully integrated in lightning and it's features are accessible in just a few lines of code. Let me know if you have any questions.

https://medium.com/pytorch-lightning/accessible-multi-billion-parameter-model-training-with-pytorch-lightning-deepspeed-c9333ac3bb59