epfLLM / Megatron-LLM

distributed trainer for LLMs

epfLLM/Megatron-LLM Issues

llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256) #81
Updated 2 months ago1
Introduce Sailor Models
Closed 3 months ago1
Gemma Support
Closed 3 months ago
Error in document (https://epfllm.github.io/Megatron-LLM/guide/instruction_tuning.html#data-preprocessing)
Updated 4 months ago
Does it support sequence parallel?
Closed 5 months ago1
Multi nodes
Closed 5 months ago1
Any plans to rebase the codebase to most recent Megatron-LM for MoE?
Updated 5 months ago
Correctness when enabling FlashAttention + Sequence Parallel at the same time?
Closed 5 months ago2
Support QWen？
Updated 6 months ago1
How to load from a saved intermediate checkpoint?
Closed 6 months ago3
LLaMA2-70B Inference Optmization
Closed 6 months ago1
error: preprocess.py file error while working on custom data
Updated 6 months ago
LLaMa and Mistral 7B pretraining support
Closed 7 months ago2
One question about the permute function code in permute_qkv.py
Updated 7 months ago2
llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256)
Closed 7 months ago1
args.make_vocab_size_divisible_by set failed
Closed 7 months ago1
Support for Mistral
Closed 8 months ago7
Nice-to-have training features
Updated 9 months ago
RuntimeError: seq_len <= 2048 INTERNAL ASSERT FAILED
Closed 10 months ago4
[Megatron Base Version] Would mind share the based version of Megatron ?
Closed 10 months ago7
run finetune llama2-7B error
Closed 10 months ago2
finetune llama2-7B when set --seq_length 4096 error
Closed 10 months ago1
run finetune llama2-7B error
Closed 10 months ago1
Prepend bos token
Closed 10 months ago1
[Swiglu] question about swiglu
Closed 10 months ago6
Feature Request: Can we directly use the huggingface dataset for training
Closed 10 months ago4
Loading weights from hf conversion with different TP,PP settings
Closed 10 months ago14
Getting started "shard" model not working
Closed 10 months ago9
RuntimeError: mat1 and mat2 shapes cannot be multiplied (29056x22016 and 11008x4096)
Closed 10 months ago2
[Save checkpoint needs long time]
Closed 10 months ago2
support falcon 180B
Updated 10 months ago
iteration-time increases linearly when micro_batch_size=1
Closed a year ago1
iteration-time increases linearly (for TP=2, PP=1 & TP=1, PP=2)
Closed a year ago8
convert huggingface model to megatron. "Only llama v2 available using huggingface"
Closed a year ago1
llama2 & vocabulary padding (making embedding layer sizes divisible by 128)
Closed a year ago1
Add update_to_hub docs
Closed a year ago
dose 8 A100 80g enough to finetune 70b llama2 ?
Closed a year ago5
HF LLaMA -> megatron weight
Closed a year ago5
how to convert baichuan-13b into megatron weights?
Closed a year ago3
Convert LLama-30B to Megatron Error
Closed a year ago1
add GQA(MQA) support in megatron2hf conversion
Closed a year ago
Generate HuggingFace tokenizer configuration as part of megatron2hf.py (weight conversion)
Closed a year ago2
Add falcon support in megatron2hf.py
Closed a year ago4
Passed position_ids are ignored for `PositionEmbeddingType.rotary`
Closed a year ago1
NaN detection possibly ineffective
Closed a year ago
Validation metrics are not logged to wandb
Closed a year ago1
convert_llama2hf.py should be replaced with newer version
Closed a year ago3
cuda misaligned address in pretrain llama2 7B
Closed a year ago2
The training speed is two times slower than the Megatron-LM and Megatron-Deepspeed
Closed a year ago5
Error during merge of sharded checkpoint: 'TransformerLanguageModel' object has no attribute 'lm_head'
Closed a year ago1