Can we also train BLOOM model using tensor using tensor-Parallelism and efficient fused CUDA kernels

Question

Can we also train BLOOM model using tensor using tensor-Parallelism and efficient fused CUDA kernels

CloudedLeopard17 opened this issue 2 years ago · comments

Neeraj Singh Aithani commented 2 years ago

Hi,

Thanks for the great work. I was able to do Inference on BLOOM 7.1 model on 24 GB GPU memory. Can we train the BLOOM models using tensor-Parallelism and efficient fused CUDA kernels? As I don't have access to high memory.

Mayank Mishra · Answer 1 · Mon Aug 29 2022 16:19:35 GMT+0800 (China Standard Time)

I don't think you will be able to do this on 24GB GPU. I am guessing you are using a RTX 3090?
You can give it a try.

Neeraj Singh Aithani · Answer 2 · Mon Aug 29 2022 23:25:25 GMT+0800 (China Standard Time)

I am using 2x A5000 GPUs. I was able to train the T5 xl model using tensor-Parallelism.

Mayank Mishra · Answer 3 · Tue Aug 30 2022 05:35:38 GMT+0800 (China Standard Time)

Did you use megatron?
Or does deepspeed has support for tensor parallel?

Neeraj Singh Aithani · Answer 4 · Tue Aug 30 2022 16:40:26 GMT+0800 (China Standard Time)

Deepspeed supports model parallelism (MP) to fit large models that would otherwise not fit in GPU memory.