maekawatoshiki / altius

Small ONNX inference runtime written in Rust

Repository from Github https://github.commaekawatoshiki/altius

Fast `softmax` kernel

maekawatoshiki opened this issue 2 years ago · comments

uint256_t commented 2 years ago

We need a better implementation (for CPU backend) for softmax.

runtime	softmax in gpt2 (ms)
onnxruntime	1.5
altius	2.4

uint256_t commented 2 years ago

The performance degradation was due to OpenMP. OMP_WAIT_POLICY=active solves this.