Fine-tune llama2 with sequence parallelism
AnirudhVIyer opened this issue · comments
Anirudh Iyer commented
Hi,
I am trying to finetune a llama2 model with sequence parallelism using Megatron-DS. Is there any documentation for this ?
Namrata Shivagunde commented
+1
puppet101 commented
+2
Stephan Kö. commented
Do you mean sequence parallelism as proposed in this work (tensor parallelism for non-matmul operations) or sequence parallelism as in DeepSpeed Ulysses (input data sharding along the sequence length dimension)? Did you already tried the --sequence-parallel
option and related ones?