Giters
stanford-futuredata
/
megablocks
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
765
Watchers:
12
Issues:
34
Forks:
131
stanford-futuredata/megablocks Issues
[integrating megablocks with open_lm] Question about megablocks + FSDP
Closed
6 months ago
Comments count
9
RuntimeError: Triton Error [CUDA]: invalid argument
Updated
2 months ago
Comments count
15
Does this framework support SFT?
Updated
2 months ago
Comments count
2
Has anyone encountered this CUDA error?
Closed
6 months ago
Comments count
15
AMP + BF16 failing
Updated
4 months ago
Comments count
2
Unsharding scripts for megablocks models
Updated
5 months ago
the wrong loss func was chosen at evaluation
Updated
5 months ago
Comments count
2
Seeking a good multi-node training config
Updated
5 months ago
Comments count
3
selective router precision
Updated
5 months ago
Comments count
1
different load_balancing_loss with different pipeline_parallel_size
Updated
5 months ago
Comments count
8
Error from pip about missing torch module
Closed
5 months ago
Comments count
4
Docker issues with PyPI installation
Updated
5 months ago
Comments count
3
ParallelDroplessMLP initialises self.mlp twice
Updated
5 months ago
Comments count
6
Gradient scale size for expert gradient
Closed
5 months ago
Comments count
4
save loading_balancing_loss properly
Closed
5 months ago
Comments count
2
How to integrate to transformers-based mixtral
Updated
5 months ago
Comments count
1
Why the second matrix of the mlp layer has the same shape of the first one?
Updated
5 months ago
Comments count
1
[BUG] Optimizer Weights Not Reloaded When Training with bf16 Pretrained Weights
Updated
5 months ago
Comments count
1
Comparison against top-2 routing?
Updated
5 months ago
Comments count
4
Script for Full Fine-Tuning of Mixtral
Updated
5 months ago
Comments count
1
Efficiency of torch mlp
Closed
6 months ago
Comments count
2
How do you use routing balancing loss under pipeline parallelism
Closed
6 months ago
Comments count
5
Question on offsets in figures 5
Closed
6 months ago
Comments count
1
How to add support for swiglu in Megablocks?
Closed
6 months ago
Comments count
14
Wrong outputs for hidden dim 14336
Closed
6 months ago
Comments count
3
About the Multi-node Script
Closed
6 months ago
Comments count
4
Inference code
Closed
6 months ago
Comments count
5
How to pip install the latest megablocks?
Closed
6 months ago
Comments count
2
Installation fails due to missing mosaicml-turbo
Closed
6 months ago
Comments count
2
Latest GitHub release version higher than main branch setup.py
Closed
6 months ago
Comments count
4
Why not support tensor model parallel?
Closed
7 months ago
Comments count
7
multi-node problem
Closed
9 months ago
Comments count
5
Does megablocks support the true expert parallelism?
Closed
9 months ago
Comments count
2
Current installation instructions don't quite work
Closed
a year ago
Comments count
1