microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Errors when running Phi-3 on NVIDIA CUDA with `onnxruntime_genai`

jackylu0124 opened this issue · comments

Hey all, I am trying to run Phi-3 on NVIDIA CUDA following the steps listed here (https://onnxruntime.ai/docs/genai/tutorials/phi3-python.html#run-with-nvidia-cuda), I was able to install the pip package successfully by running the command pip install --pre onnxruntime-genai-cuda --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/, but when I run the inference file with python phi3-qa.py -m cuda/cuda-int4-rtn-block-32, I got the following errors (note that the xxx in the paths in the error logs are just blurred out information for privacy reasons):

Traceback (most recent call last):
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\onnxruntime_genai\__init__.py", line 11, in <module>
    from onnxruntime_genai.onnxruntime_genai import *
ImportError: DLL load failed while importing onnxruntime_genai: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\xxx\xxx\xxx\onnx_test\phi3-qa.py", line 2, in <module>
    import onnxruntime_genai as og
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\onnxruntime_genai\__init__.py", line 14, in <module>
    from onnxruntime_genai.onnxruntime_genai import *
ImportError: DLL load failed while importing onnxruntime_genai: The specified module could not be found.

I would really appreciate any insights or solutions to this issue. Thanks for your time and help in advance!

Could you please share your cuda environment? onnxruntime-genai-cuda requires cuda 11.8 to be installed. The python package expects that the system variable CUDA_PATH is set to the binary directory in your cuda toolkit folder.

From your error message, my guess is that python is not able to locate your CUDA binaries.

Example from my machine:

image

Hi @baijumeswani , thank you very much for your fast reply! Here's my environment, do I have to install CUDA 11.8 or is onnxruntime-genai-cuda backward compatible with CUDA 12.1 (which I have installed)?
image

Hi @baijumeswani , thank you very much for your fast reply! Here's my environment, do I have to install CUDA 11.8 or is onnxruntime-genai-cuda backward compatible with CUDA 12.1 (which I have installed)?

You need to have cuda 11.8 installed. We are yet to release cuda 12 packages.

If installing cuda 11.8 is not feasible, you could build from source. Here are instructions on building from source: https://onnxruntime.ai/docs/genai/howto/build-from-source.html

I see, thanks for the clarification!

  1. Do you by chance know the timeline on the release of the CUDA 12 packages?
  2. Do you also by chance have containers that you can share for building the packages (especially for building the pip package that's compatible with CUDA 12 on Linux)? I would really appreciate it!

I see, thanks for the clarification!

  1. Do you by chance know the timeline on the release of the CUDA 12 packages?
  2. Do you also by chance have containers that you can share for building the packages (especially for building the pip package that's compatible with CUDA 12 on Linux)? I would really appreciate it!

A follow-up to these questions. Thanks!

  1. Do you by chance know the timeline on the release of the CUDA 12 packages?

We are planning to add cuda 12 packages for the next release (0.3.0). Probably in the next 2-3 weeks.

  1. Do you also by chance have containers that you can share for building the packages (especially for building the pip package that's compatible with CUDA 12 on Linux)? I would really appreciate it!

We do not have any managed containers that I can share for building onnxruntime-genai-cuda. You could try looking for cuda containers here.

Hi @baijumeswani, thank you very much for your reply and update! Really looking forward to the 0.3.0 release!

Hi @baijumeswani, a quick follow-up to this, is there any update regarding a 0.3.0 release candidate that supports CUDA 12? Thanks for you guys' great work again!

@jackylu0124 the cuda 12 pipeline is almost ready. We still are working on fixing a couple of blockers before we can release 0.3.0. But my best estimate at this point is that we will be publishing the 0.3.0 package sometime this week.

@baijumeswani Got it, thank you so much for you guys' work again! Really incredible work! Can't wait to try it out!

Hi @baijumeswani, a quick follow-up to this, is there any update regarding to when the CUDA 12 compatible version will be released?

Hi @baijumeswani, just a quick follow-up to the message above, thanks!

Hi @baijumeswani, I see that v0.3.0 has been released, but I tried installed the CUDA version with the command pip install onnxruntime-genai-cuda --pre --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/, and I still see the same error reported above when I ran the sample script. Do you by chance know when I will be able to download and install the CUDA 12 compatible version?

Thanks a lot for again!

Hi @jackylu0124 sorry for the delay in responding. We ran into some trouble releasing the cuda 12 binaries dur to packaging problems.
I am working on having the cuda 12 package out today.

Hi @baijumeswani, thank you very much for the fast reply and update! I am looking forward to trying it out, thanks for the great work you guys are doing as always!

Hi @baijumeswani, sorry for bothering you again, is there any update on when the CUDA 12 compatible package will be avaibable? Thanks!

Ok, finally the cuda 12 packages were published:

Python:

pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ onnxruntime-genai-cuda

Nuget: you can add this to your nuget.config:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSources>
    <clear />
    <add key="onnxruntime-cuda-12" value="https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/nuget/v3/index.json" />
  </packageSources>
</configuration>

C/C++ API: Packages uploaded to:

https://github.com/microsoft/onnxruntime-genai/releases/download/v0.3.0/

Requirements are cuda 12 and cudnn 9.

Hi @baijumeswani , thank you so much for the update and release! I really appreciate it and will try it out in a bit.

For the Python one, do I need to use the --pre flag in pip install like the example command pip install onnxruntime-genai-cuda --pre --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/ as shown on https://onnxruntime.ai/docs/genai/howto/install.html?

Thanks a lot again!

--pre is needed only for pre release versions. For the stable release such as 0.3.0, you do not need the --pre.

Got it, thanks for the clarification!

Hi @baijumeswani, thank you for the great work! I tried out the CUDA 12 package on my Windows machine and it works as expected. I have two follow-up questions:

  1. If I want to install that particular CUDA 12 compatible onnxruntime-genai-cuda version that's just released (when I build my Docker container) for example, do I just do pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ onnxruntime-genai-cuda==0.3.0 because that URL will be re-used for future version releases?
  2. Regarding loading the quantized CUDA model with og.Model(), do I only need to keep the Phi-3-mini-4k-instruct-onnx/cuda/cuda-int4-rtn-block-32 folder, and can I remove the Phi-3-mini-4k-instruct-onnx/cpu_and_mobile, Phi-3-mini-4k-instruct-onnx/directml, and Phi-3-mini-4k-instruct-onnx/cuda/cuda-fp16 folders in order to save some space during deployment?

Thanks a lot for all the help again!

Hi @jackylu0124

  1. Yes, this index url will be reused for future onnxruntime-genai-cuda python packages for cuda 12. If you want only version 0.3.0, you can use ==0.3.0.
  2. For deployment, please only keep the model that you need for og.Model(...). If you want to only use the cuda-int4-rtn-block-32 model, you do not need to deploy any of the other models.

I'll close this issue now. Please feel free to add more comments or questions.