Giters
epfLLM
/
Megatron-LLM
distributed trainer for LLMs
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
504
Watchers:
18
Issues:
58
Forks:
75
epfLLM/Megatron-LLM Issues
llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256) #81
Updated
2 months ago
Comments count
1
Introduce Sailor Models
Closed
3 months ago
Comments count
1
Gemma Support
Closed
3 months ago
Error in document (https://epfllm.github.io/Megatron-LLM/guide/instruction_tuning.html#data-preprocessing)
Updated
4 months ago
Does it support sequence parallel?
Closed
5 months ago
Comments count
1
Multi nodes
Closed
5 months ago
Comments count
1
Any plans to rebase the codebase to most recent Megatron-LM for MoE?
Updated
5 months ago
Correctness when enabling FlashAttention + Sequence Parallel at the same time?
Closed
5 months ago
Comments count
2
Support QWen?
Updated
6 months ago
Comments count
1
How to load from a saved intermediate checkpoint?
Closed
6 months ago
Comments count
3
LLaMA2-70B Inference Optmization
Closed
6 months ago
Comments count
1
error: preprocess.py file error while working on custom data
Updated
6 months ago
LLaMa and Mistral 7B pretraining support
Closed
7 months ago
Comments count
2
One question about the permute function code in permute_qkv.py
Updated
7 months ago
Comments count
2
llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256)
Closed
7 months ago
Comments count
1
args.make_vocab_size_divisible_by set failed
Closed
7 months ago
Comments count
1
Support for Mistral
Closed
8 months ago
Comments count
7
Nice-to-have training features
Updated
9 months ago
RuntimeError: seq_len <= 2048 INTERNAL ASSERT FAILED
Closed
10 months ago
Comments count
4
[Megatron Base Version] Would mind share the based version of Megatron ?
Closed
10 months ago
Comments count
7
run finetune llama2-7B error
Closed
10 months ago
Comments count
2
finetune llama2-7B when set --seq_length 4096 error
Closed
10 months ago
Comments count
1
run finetune llama2-7B error
Closed
10 months ago
Comments count
1
Prepend bos token
Closed
10 months ago
Comments count
1
[Swiglu] question about swiglu
Closed
10 months ago
Comments count
6
Feature Request: Can we directly use the huggingface dataset for training
Closed
10 months ago
Comments count
4
Loading weights from hf conversion with different TP,PP settings
Closed
10 months ago
Comments count
14
Getting started "shard" model not working
Closed
10 months ago
Comments count
9
RuntimeError: mat1 and mat2 shapes cannot be multiplied (29056x22016 and 11008x4096)
Closed
10 months ago
Comments count
2
[Save checkpoint needs long time]
Closed
10 months ago
Comments count
2
support falcon 180B
Updated
10 months ago
iteration-time increases linearly when micro_batch_size=1
Closed
a year ago
Comments count
1
iteration-time increases linearly (for TP=2, PP=1 & TP=1, PP=2)
Closed
a year ago
Comments count
8
convert huggingface model to megatron. "Only llama v2 available using huggingface"
Closed
a year ago
Comments count
1
llama2 & vocabulary padding (making embedding layer sizes divisible by 128)
Closed
a year ago
Comments count
1
Add update_to_hub docs
Closed
a year ago
dose 8 A100 80g enough to finetune 70b llama2 ?
Closed
a year ago
Comments count
5
HF LLaMA -> megatron weight
Closed
a year ago
Comments count
5
how to convert baichuan-13b into megatron weights?
Closed
a year ago
Comments count
3
Convert LLama-30B to Megatron Error
Closed
a year ago
Comments count
1
add GQA(MQA) support in megatron2hf conversion
Closed
a year ago
Generate HuggingFace tokenizer configuration as part of megatron2hf.py (weight conversion)
Closed
a year ago
Comments count
2
Add falcon support in megatron2hf.py
Closed
a year ago
Comments count
4
Passed position_ids are ignored for `PositionEmbeddingType.rotary`
Closed
a year ago
Comments count
1
NaN detection possibly ineffective
Closed
a year ago
Validation metrics are not logged to wandb
Closed
a year ago
Comments count
1
convert_llama2hf.py should be replaced with newer version
Closed
a year ago
Comments count
3
cuda misaligned address in pretrain llama2 7B
Closed
a year ago
Comments count
2
The training speed is two times slower than the Megatron-LM and Megatron-Deepspeed
Closed
a year ago
Comments count
5
Error during merge of sharded checkpoint: 'TransformerLanguageModel' object has no attribute 'lm_head'
Closed
a year ago
Comments count
1
Previous
Next