Fine-tune llama2 with sequence parallelism

Question

Fine-tune llama2 with sequence parallelism

AnirudhVIyer opened this issue 3 months ago · comments

Hi,
I am trying to finetune a llama2 model with sequence parallelism using Megatron-DS. Is there any documentation for this ?

Namrata Shivagunde commented 3 months ago

+1

puppet101 commented 3 months ago

+2

Stephan Kö. · Answer 1 · Tue Mar 26 2024 09:19:09 GMT+0800 (China Standard Time)

Do you mean sequence parallelism as proposed in this work (tensor parallelism for non-matmul operations) or sequence parallelism as in DeepSpeed Ulysses (input data sharding along the sequence length dimension)? Did you already tried the --sequence-parallel option and related ones?