bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can we also train BLOOM model using tensor using tensor-Parallelism and efficient fused CUDA kernels

CloudedLeopard17 opened this issue · comments

Hi,

Thanks for the great work. I was able to do Inference on BLOOM 7.1 model on 24 GB GPU memory. Can we train the BLOOM models using tensor-Parallelism and efficient fused CUDA kernels? As I don't have access to high memory.

I don't think you will be able to do this on 24GB GPU. I am guessing you are using a RTX 3090?
You can give it a try.

I am using 2x A5000 GPUs. I was able to train the T5 xl model using tensor-Parallelism.

Did you use megatron?
Or does deepspeed has support for tensor parallel?

Deepspeed supports model parallelism (MP) to fit large models that would otherwise not fit in GPU memory.