Fast `softmax` kernel
maekawatoshiki opened this issue · comments
- We need a better implementation (for CPU backend) for
softmax
.
runtime | softmax in gpt2 (ms) |
---|---|
onnxruntime | 1.5 |
altius | 2.4 |
- The performance degradation was due to OpenMP.
OMP_WAIT_POLICY=active
solves this.