stanford-futuredata / megablocks

stanford-futuredata/megablocks Issues

[integrating megablocks with open_lm] Question about megablocks + FSDP
Closed 6 months ago9
RuntimeError: Triton Error [CUDA]: invalid argument
Updated 2 months ago15
Does this framework support SFT?
Updated 2 months ago2
Has anyone encountered this CUDA error?
Closed 6 months ago15
AMP + BF16 failing
Updated 4 months ago2
Unsharding scripts for megablocks models
Updated 5 months ago
the wrong loss func was chosen at evaluation
Updated 5 months ago2
Seeking a good multi-node training config
Updated 5 months ago3
selective router precision
Updated 5 months ago1
different load_balancing_loss with different pipeline_parallel_size
Updated 5 months ago8
Error from pip about missing torch module
Closed 5 months ago4
Docker issues with PyPI installation
Updated 5 months ago3
ParallelDroplessMLP initialises self.mlp twice
Updated 5 months ago6
Gradient scale size for expert gradient
Closed 5 months ago4
save loading_balancing_loss properly
Closed 5 months ago2
How to integrate to transformers-based mixtral
Updated 5 months ago1
Why the second matrix of the mlp layer has the same shape of the first one?
Updated 5 months ago1
[BUG] Optimizer Weights Not Reloaded When Training with bf16 Pretrained Weights
Updated 5 months ago1
Comparison against top-2 routing?
Updated 5 months ago4
Script for Full Fine-Tuning of Mixtral
Updated 5 months ago1
Efficiency of torch mlp
Closed 6 months ago2
How do you use routing balancing loss under pipeline parallelism
Closed 6 months ago5
Question on offsets in figures 5
Closed 6 months ago1
How to add support for swiglu in Megablocks?
Closed 6 months ago14
Wrong outputs for hidden dim 14336
Closed 6 months ago3
About the Multi-node Script
Closed 6 months ago4
Inference code
Closed 6 months ago5
How to pip install the latest megablocks?
Closed 6 months ago2
Installation fails due to missing mosaicml-turbo
Closed 6 months ago2
Latest GitHub release version higher than main branch setup.py
Closed 6 months ago4
Why not support tensor model parallel?
Closed 7 months ago7
multi-node problem
Closed 9 months ago5
Does megablocks support the true expert parallelism?
Closed 9 months ago2
Current installation instructions don't quite work
Closed a year ago1