microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fine-tune llama2 with sequence parallelism

AnirudhVIyer opened this issue · comments

Hi,
I am trying to finetune a llama2 model with sequence parallelism using Megatron-DS. Is there any documentation for this ?

Do you mean sequence parallelism as proposed in this work (tensor parallelism for non-matmul operations) or sequence parallelism as in DeepSpeed Ulysses (input data sharding along the sequence length dimension)? Did you already tried the --sequence-parallel option and related ones?