Errors when running Phi-3 on NVIDIA CUDA with `onnxruntime_genai`

Question

Errors when running Phi-3 on NVIDIA CUDA with `onnxruntime_genai`

jackylu0124 opened this issue 2 months ago · comments

Hey all, I am trying to run Phi-3 on NVIDIA CUDA following the steps listed here (https://onnxruntime.ai/docs/genai/tutorials/phi3-python.html#run-with-nvidia-cuda), I was able to install the pip package successfully by running the command pip install --pre onnxruntime-genai-cuda --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/, but when I run the inference file with python phi3-qa.py -m cuda/cuda-int4-rtn-block-32, I got the following errors (note that the xxx in the paths in the error logs are just blurred out information for privacy reasons):

Traceback (most recent call last):
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\onnxruntime_genai\__init__.py", line 11, in <module>
    from onnxruntime_genai.onnxruntime_genai import *
ImportError: DLL load failed while importing onnxruntime_genai: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\xxx\xxx\xxx\onnx_test\phi3-qa.py", line 2, in <module>
    import onnxruntime_genai as og
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python311\Lib\site-packages\onnxruntime_genai\__init__.py", line 14, in <module>
    from onnxruntime_genai.onnxruntime_genai import *
ImportError: DLL load failed while importing onnxruntime_genai: The specified module could not be found.

I would really appreciate any insights or solutions to this issue. Thanks for your time and help in advance!

Baiju Meswani · Answer 1 · Wed May 22 2024 04:23:45 GMT+0800 (China Standard Time)

Could you please share your cuda environment? onnxruntime-genai-cuda requires cuda 11.8 to be installed. The python package expects that the system variable CUDA_PATH is set to the binary directory in your cuda toolkit folder.

From your error message, my guess is that python is not able to locate your CUDA binaries.

Example from my machine:

jackylu0124 · Answer 2 · Wed May 22 2024 04:29:51 GMT+0800 (China Standard Time)

Hi @baijumeswani , thank you very much for your fast reply! Here's my environment, do I have to install CUDA 11.8 or is onnxruntime-genai-cuda backward compatible with CUDA 12.1 (which I have installed)?

Baiju Meswani · Answer 3 · Wed May 22 2024 04:33:46 GMT+0800 (China Standard Time)

Hi @baijumeswani , thank you very much for your fast reply! Here's my environment, do I have to install CUDA 11.8 or is onnxruntime-genai-cuda backward compatible with CUDA 12.1 (which I have installed)?

You need to have cuda 11.8 installed. We are yet to release cuda 12 packages.

If installing cuda 11.8 is not feasible, you could build from source. Here are instructions on building from source: https://onnxruntime.ai/docs/genai/howto/build-from-source.html

jackylu0124 · Answer 4 · Wed May 22 2024 05:55:39 GMT+0800 (China Standard Time)

I see, thanks for the clarification!

Do you by chance know the timeline on the release of the CUDA 12 packages?
Do you also by chance have containers that you can share for building the packages (especially for building the pip package that's compatible with CUDA 12 on Linux)? I would really appreciate it!

jackylu0124 · Answer 5 · Wed May 22 2024 23:10:19 GMT+0800 (China Standard Time)

I see, thanks for the clarification!

Do you by chance know the timeline on the release of the CUDA 12 packages?

Do you also by chance have containers that you can share for building the packages (especially for building the pip package that's compatible with CUDA 12 on Linux)? I would really appreciate it!

A follow-up to these questions. Thanks!

Baiju Meswani · Answer 6 · Thu May 23 2024 01:28:40 GMT+0800 (China Standard Time)

Do you by chance know the timeline on the release of the CUDA 12 packages?

We are planning to add cuda 12 packages for the next release (0.3.0). Probably in the next 2-3 weeks.

Do you also by chance have containers that you can share for building the packages (especially for building the pip package that's compatible with CUDA 12 on Linux)? I would really appreciate it!

We do not have any managed containers that I can share for building onnxruntime-genai-cuda. You could try looking for cuda containers here.

jackylu0124 · Answer 7 · Thu May 23 2024 07:33:44 GMT+0800 (China Standard Time)

Hi @baijumeswani, thank you very much for your reply and update! Really looking forward to the 0.3.0 release!

jackylu0124 · Answer 8 · Mon Jun 10 2024 10:20:02 GMT+0800 (China Standard Time)

Hi @baijumeswani, a quick follow-up to this, is there any update regarding a 0.3.0 release candidate that supports CUDA 12? Thanks for you guys' great work again!

Baiju Meswani · Answer 9 · Mon Jun 10 2024 10:51:42 GMT+0800 (China Standard Time)

@jackylu0124 the cuda 12 pipeline is almost ready. We still are working on fixing a couple of blockers before we can release 0.3.0. But my best estimate at this point is that we will be publishing the 0.3.0 package sometime this week.

jackylu0124 · Answer 10 · Mon Jun 10 2024 10:55:52 GMT+0800 (China Standard Time)

@baijumeswani Got it, thank you so much for you guys' work again! Really incredible work! Can't wait to try it out!

jackylu0124 · Answer 11 · Tue Jun 18 2024 02:49:01 GMT+0800 (China Standard Time)

Hi @baijumeswani, a quick follow-up to this, is there any update regarding to when the CUDA 12 compatible version will be released?

jackylu0124 · Answer 12 · Fri Jun 21 2024 22:38:29 GMT+0800 (China Standard Time)

Hi @baijumeswani, just a quick follow-up to the message above, thanks!

jackylu0124 · Answer 13 · Mon Jun 24 2024 22:54:23 GMT+0800 (China Standard Time)

Hi @baijumeswani, I see that v0.3.0 has been released, but I tried installed the CUDA version with the command pip install onnxruntime-genai-cuda --pre --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/, and I still see the same error reported above when I ran the sample script. Do you by chance know when I will be able to download and install the CUDA 12 compatible version?

Thanks a lot for again!

Baiju Meswani · Answer 14 · Mon Jun 24 2024 23:51:19 GMT+0800 (China Standard Time)

Hi @jackylu0124 sorry for the delay in responding. We ran into some trouble releasing the cuda 12 binaries dur to packaging problems.
I am working on having the cuda 12 package out today.

jackylu0124 · Answer 15 · Mon Jun 24 2024 23:57:21 GMT+0800 (China Standard Time)

Hi @baijumeswani, thank you very much for the fast reply and update! I am looking forward to trying it out, thanks for the great work you guys are doing as always!

jackylu0124 · Answer 16 · Wed Jun 26 2024 04:15:55 GMT+0800 (China Standard Time)

Hi @baijumeswani, sorry for bothering you again, is there any update on when the CUDA 12 compatible package will be avaibable? Thanks!

Baiju Meswani · Answer 17 · Wed Jun 26 2024 06:14:30 GMT+0800 (China Standard Time)

Ok, finally the cuda 12 packages were published:

Python:

pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ onnxruntime-genai-cuda

Nuget: you can add this to your nuget.config:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSources>
    <clear />
    <add key="onnxruntime-cuda-12" value="https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/nuget/v3/index.json" />
  </packageSources>
</configuration>

C/C++ API: Packages uploaded to:

https://github.com/microsoft/onnxruntime-genai/releases/download/v0.3.0/

Requirements are cuda 12 and cudnn 9.

jackylu0124 · Answer 18 · Wed Jun 26 2024 06:41:02 GMT+0800 (China Standard Time)

Hi @baijumeswani , thank you so much for the update and release! I really appreciate it and will try it out in a bit.

For the Python one, do I need to use the --pre flag in pip install like the example command pip install onnxruntime-genai-cuda --pre --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/ as shown on https://onnxruntime.ai/docs/genai/howto/install.html?

Thanks a lot again!

Baiju Meswani · Answer 19 · Wed Jun 26 2024 06:47:01 GMT+0800 (China Standard Time)

--pre is needed only for pre release versions. For the stable release such as 0.3.0, you do not need the --pre.

jackylu0124 · Answer 20 · Wed Jun 26 2024 11:29:30 GMT+0800 (China Standard Time)

Got it, thanks for the clarification!

jackylu0124 · Answer 21 · Fri Jun 28 2024 01:53:45 GMT+0800 (China Standard Time)

Hi @baijumeswani, thank you for the great work! I tried out the CUDA 12 package on my Windows machine and it works as expected. I have two follow-up questions:

If I want to install that particular CUDA 12 compatible onnxruntime-genai-cuda version that's just released (when I build my Docker container) for example, do I just do pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ onnxruntime-genai-cuda==0.3.0 because that URL will be re-used for future version releases?
Regarding loading the quantized CUDA model with og.Model(), do I only need to keep the Phi-3-mini-4k-instruct-onnx/cuda/cuda-int4-rtn-block-32 folder, and can I remove the Phi-3-mini-4k-instruct-onnx/cpu_and_mobile, Phi-3-mini-4k-instruct-onnx/directml, and Phi-3-mini-4k-instruct-onnx/cuda/cuda-fp16 folders in order to save some space during deployment?

Thanks a lot for all the help again!

Baiju Meswani · Answer 22 · Fri Jun 28 2024 02:30:29 GMT+0800 (China Standard Time)

Hi @jackylu0124

Yes, this index url will be reused for future onnxruntime-genai-cuda python packages for cuda 12. If you want only version 0.3.0, you can use ==0.3.0.
For deployment, please only keep the model that you need for og.Model(...). If you want to only use the cuda-int4-rtn-block-32 model, you do not need to deploy any of the other models.

I'll close this issue now. Please feel free to add more comments or questions.