microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature Request] phi-3-small-128k-onnx-cpu model

Ben-Epstein opened this issue · comments

The onnx-gpu model for phi-3-small-128k is great, a perfect balance of quality and speed. Is there a plan to support a cpu version?

Thanks!

The Phi-Small model contains the SparseAttention operator and requires the kernel to be defined and implemented in ONNX Runtime. As of now, we only have the kernel implemented for CUDA.
We intend to add a CPU kernel as well in the near future. Once that is added, we will be able to support Phi-Small on CPU as well.

We intend to add a CPU kernel as well in the near future. Once that is added, we will be able to support Phi-Small on CPU as well.

Interesting... according to these two pages Run Phi-3 language models with the ONNX Runtime generate() API and Run the Phi-3 vision model with the ONNX Runtime generate() API, they can run on CPU. Am I missing something?

Just a FYI, Im new to Python/Cuda/Pytorch/etc and this ecosystem.

Interesting... according to these two pages Run Phi-3 language models with the ONNX Runtime generate() API and Run the Phi-3 vision model with the ONNX Runtime generate() API, they can run on CPU. Am I missing something?

All phi3 family models except for phi3-small can be run on CPU. The linked documentation doesn't mention the phi3-small model. Maybe we should explicitly call it out in the doc.