timlee0212 / SiDA-MoE

Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MOE

Collecting Activations for large models

  1. Run python main.py --model=xxx --sharding. The script will load the pretrained weight from HF to our customized model and save the weight in a sharded format at ./result/[DATABASE]/[MODEL]/ShardedCkpt
  2. Run python main.py --model=xxx to perform inference with the HF load_and_dispatch and collect the activations for use.

TODO:

[ ] Add Disk Offload Function. [ ] Process sharded format when the model size is larger than the main memory.

About

Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"

License:MIT License


Languages

Language:Python 99.7%Language:Shell 0.3%