- Pip: please see requirements.txt
- Python:
3.10.2
- In order to make smoe work with python
3.10.2
(we do only have this version), I removed triton autotune and hardcode the static config (issue)
- I extracted smoe related stuff into one file for testing convenience
- I removed allow_tf32 because it seems to affect the precision?
- fp16 kernel is slower than huggingface implementation
- fp32 kernel is much faster than huggingface as claimed in the paper
$ pip install -r requirements.txt
$ pytest test_moe.py::test_correctness -s
$ pytest test_moe.py::test_bench_speed_moe_wrapper -s
$ pytest test_moe.py::test_bench_memory_moe_wrapper -s
Image |
Description |
|
FP32 Speed Benchmark for MoE |
|
FP16 Speed Benchmark for MoE |
|
Memory Benchmark for MoE |