microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

With Latest GeForce Driver v555 - Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

AshD opened this issue · comments

commented

Upgraded to the latest GeForce driver v555 and using Phi-3- medium-instruct 128K on a RTX 4090 Windows 11 PC.

Same C# code that does inference is throwing an exception at generator.ComputeLogits();

Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: 'D:\a_work\1\onnxruntime-genai\src\dml\dml_command_recorder.cpp(143)\onnxruntime-genai.DLL!00007FFE7A4957F8: (caller: 00007FFE7A4959AD) Exception(1) tid(46d8) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

commented

Using Microsoft.ML.OnnxRuntimeGenAI.DirectML v0.2

commented

Some more info that made me upgrade the driver
https://blogs.nvidia.com/blog/rtx-advanced-ai-windows-pc-build/

Developers can unlock the full capabilities of RTX hardware with the new [R555 driver](https://www.nvidia.com/download/index.aspx), bringing better AI experiences to consumers, faster. It includes:

Support for DQ-GEMM metacommand to handle INT4 weight-only quantization for LLMs
New RMSNorm normalization methods for Llama 2, Llama 3, Mistral and Phi-3 models
Group and multi-query attention mechanisms, and sliding window attention to support Mistral
In-place KV updates to improve attention performance
Support for GEMM of non-multiple-of-8 tensors to improve context phase performance

Hi @AshD, was it working before upgrading the driver? Regardless, I'll be able to look into it soon.

commented

Yes. It was working yesterday.

I was having a different issue yesterday on the line below when MaxLength was set to 32000 for the 128K Medium Phi-3, the model's GPU memory went to over 24GB on my 4090.
generatorParams.SetSearchOption("max_length", onnxRuntimeGenAIPromptExecutionSettings.MaxLength);

Yes. It was working yesterday.

I was having a different issue yesterday on the line below when MaxLength was set to 32000 for the 128K Medium Phi-3, the model's GPU memory went to over 24GB on my 4090. generatorParams.SetSearchOption("max_length", onnxRuntimeGenAIPromptExecutionSettings.MaxLength);

The memory issue makes sense. We are already looking into making the onnxruntime memory usage way better in our upcoming released (planned for June), but using a max length of 32000 will always have a significant memory requirement due to the size of the cache.

I'll let you know once I have an update on the driver issue.

commented

I verified that this is an issue with the latest Nvidia game ready driver v555.85 by downgrading to v552.44

Nvidia game ready driver v552.44 works!

Hi @AshD,

Nvidia was able to reproduce the issue and fixed it internally. I'll let you know once we have a release date for the new driver. This issue should only happen when the sequence length is bigger than 25550, so maybe you can limit the max length to that value for the time being.

Hi @AshD,

Nvidia released a new driver on 6/4. Can you verify if it fixes your issue?

commented

I think Nvidia fixed that issue.

But I am getting another error. I think it is when the GPU runs out of memory.

Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: 'Non-zero status code returned while running DmlFusedNode_0_0 node. Name:'DmlFusedNode_0_0' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(1060)\onnxruntime.dll!00007FFBDF3FA0B1: (caller: 00007FFBDF488F0B) Exception(2) tid(426c) 887A0001 The application made a call that is invalid. Either the parameters of the call or the state of some object was incorrect.
Enable the D3D debug layer in order to see details via debug messages.
'

commented

I am following this sample
https://github.com/microsoft/onnxruntime-genai/blob/main/examples/csharp/HelloPhi3V/Program.cs

It's not releasing the GPU memory inside of the while true loop.

I am hitting a similar issue after updating to the Nvidia driver with the driver fix mentioned above. Before the driver update it was always failing. After the driver update, it succeeded once and then went back to failing.

Error: RuntimeError: D:\a\_work\1\onnxruntime-genai\src\dml\dml_command_recorder.cpp(143)\onnxruntime_genai.cp312-win_amd64.pyd!00007FF8AE2CECC3: (caller: 00007FF8AE2C28F5) Exception(1) tid(1368) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

Repro notebook: https://github.com/mmaitre314/phi3-experiments/blob/main/onnx-dml.ipynb

Driver updated to: 555.99-notebook-win10-win11-64bit-international-dch-whql

GPU info in Task Manager:

NVIDIA GeForce GTX 1650 Ti
Driver version:	32.0.15.5599
Driver date:	6/1/2024
DirectX version:	12 (FL 12.1)
Physical location:	PCI bus 1, device 0, function 0
Dedicated GPU memory	3.3/4.0 GB
Shared GPU memory	0.6/3.9 GB
GPU Memory	3.9/7.9 GB