ByronHsu / smoe-debug

Enhancing ScatterMoE

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reference

Versions & Environments

  1. Pip: please see requirements.txt
  2. Python: 3.10.2

Notes

  1. In order to make smoe work with python 3.10.2 (we do only have this version), I removed triton autotune and hardcode the static config (issue)
  2. I extracted smoe related stuff into one file for testing convenience
  3. I removed allow_tf32 because it seems to affect the precision?

Observation

  1. fp16 kernel is slower than huggingface implementation
  2. fp32 kernel is much faster than huggingface as claimed in the paper

Install dependencies

$ pip install -r requirements.txt

Correctness test

$ pytest test_moe.py::test_correctness -s

Speed test

$ pytest test_moe.py::test_bench_speed_moe_wrapper -s

Memory test

$ pytest test_moe.py::test_bench_memory_moe_wrapper -s

Results

Image Description
moe-full-fp32-speed-benchmark FP32 Speed Benchmark for MoE
moe-full-fp16-speed-benchmark FP16 Speed Benchmark for MoE
moe-full-memory-benchmark Memory Benchmark for MoE

About

Enhancing ScatterMoE


Languages

Language:Python 99.5%Language:HTML 0.5%