With Latest GeForce Driver v555 - Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

Question

With Latest GeForce Driver v555 - Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

AshD opened this issue 2 months ago · comments

Upgraded to the latest GeForce driver v555 and using Phi-3- medium-instruct 128K on a RTX 4090 Windows 11 PC.

Same C# code that does inference is throwing an exception at generator.ComputeLogits();

Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: 'D:\a_work\1\onnxruntime-genai\src\dml\dml_command_recorder.cpp(143)\onnxruntime-genai.DLL!00007FFE7A4957F8: (caller: 00007FFE7A4959AD) Exception(1) tid(46d8) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

AshD · Answer 1 · Wed May 22 2024 23:07:20 GMT+0800 (China Standard Time)

Using Microsoft.ML.OnnxRuntimeGenAI.DirectML v0.2

Baiju Meswani · Answer 2 · Thu May 23 2024 01:30:14 GMT+0800 (China Standard Time)

cc: @PatriceVignola

AshD · Answer 3 · Thu May 23 2024 02:49:02 GMT+0800 (China Standard Time)

Some more info that made me upgrade the driver
https://blogs.nvidia.com/blog/rtx-advanced-ai-windows-pc-build/

Developers can unlock the full capabilities of RTX hardware with the new [R555 driver](https://www.nvidia.com/download/index.aspx), bringing better AI experiences to consumers, faster. It includes:

Support for DQ-GEMM metacommand to handle INT4 weight-only quantization for LLMs
New RMSNorm normalization methods for Llama 2, Llama 3, Mistral and Phi-3 models
Group and multi-query attention mechanisms, and sliding window attention to support Mistral
In-place KV updates to improve attention performance
Support for GEMM of non-multiple-of-8 tensors to improve context phase performance

Patrice Vignola · Answer 4 · Thu May 23 2024 08:12:56 GMT+0800 (China Standard Time)

Hi @AshD, was it working before upgrading the driver? Regardless, I'll be able to look into it soon.

AshD · Answer 5 · Thu May 23 2024 08:31:37 GMT+0800 (China Standard Time)

Yes. It was working yesterday.

I was having a different issue yesterday on the line below when MaxLength was set to 32000 for the 128K Medium Phi-3, the model's GPU memory went to over 24GB on my 4090.
generatorParams.SetSearchOption("max_length", onnxRuntimeGenAIPromptExecutionSettings.MaxLength);

Patrice Vignola · Answer 6 · Thu May 23 2024 09:02:19 GMT+0800 (China Standard Time)

Yes. It was working yesterday.

I was having a different issue yesterday on the line below when MaxLength was set to 32000 for the 128K Medium Phi-3, the model's GPU memory went to over 24GB on my 4090. generatorParams.SetSearchOption("max_length", onnxRuntimeGenAIPromptExecutionSettings.MaxLength);

The memory issue makes sense. We are already looking into making the onnxruntime memory usage way better in our upcoming released (planned for June), but using a max length of 32000 will always have a significant memory requirement due to the size of the cache.

I'll let you know once I have an update on the driver issue.

AshD · Answer 7 · Sun May 26 2024 06:55:52 GMT+0800 (China Standard Time)

I verified that this is an issue with the latest Nvidia game ready driver v555.85 by downgrading to v552.44

Nvidia game ready driver v552.44 works!

Patrice Vignola · Answer 8 · Mon May 27 2024 13:25:33 GMT+0800 (China Standard Time)

Hi @AshD,

Nvidia was able to reproduce the issue and fixed it internally. I'll let you know once we have a release date for the new driver. This issue should only happen when the sequence length is bigger than 25550, so maybe you can limit the max length to that value for the time being.

AshD · Answer 9 · Mon May 27 2024 21:31:24 GMT+0800 (China Standard Time)

Thanks @PatriceVignola

Patrice Vignola · Answer 10 · Fri Jun 07 2024 06:44:09 GMT+0800 (China Standard Time)

Hi @AshD,

Nvidia released a new driver on 6/4. Can you verify if it fixes your issue?

AshD · Answer 11 · Fri Jun 07 2024 10:48:44 GMT+0800 (China Standard Time)

I think Nvidia fixed that issue.

But I am getting another error. I think it is when the GPU runs out of memory.

Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: 'Non-zero status code returned while running DmlFusedNode_0_0 node. Name:'DmlFusedNode_0_0' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(1060)\onnxruntime.dll!00007FFBDF3FA0B1: (caller: 00007FFBDF488F0B) Exception(2) tid(426c) 887A0001 The application made a call that is invalid. Either the parameters of the call or the state of some object was incorrect.
Enable the D3D debug layer in order to see details via debug messages.
'

AshD · Answer 12 · Fri Jun 07 2024 11:04:45 GMT+0800 (China Standard Time)

I am following this sample
https://github.com/microsoft/onnxruntime-genai/blob/main/examples/csharp/HelloPhi3V/Program.cs

It's not releasing the GPU memory inside of the while true loop.

Matthieu Maitre · Answer 13 · Fri Jun 14 2024 11:47:32 GMT+0800 (China Standard Time)

I am hitting a similar issue after updating to the Nvidia driver with the driver fix mentioned above. Before the driver update it was always failing. After the driver update, it succeeded once and then went back to failing.

Error: RuntimeError: D:\a\_work\1\onnxruntime-genai\src\dml\dml_command_recorder.cpp(143)\onnxruntime_genai.cp312-win_amd64.pyd!00007FF8AE2CECC3: (caller: 00007FF8AE2C28F5) Exception(1) tid(1368) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

Repro notebook: https://github.com/mmaitre314/phi3-experiments/blob/main/onnx-dml.ipynb

Driver updated to: 555.99-notebook-win10-win11-64bit-international-dch-whql

GPU info in Task Manager:

NVIDIA GeForce GTX 1650 Ti
Driver version:	32.0.15.5599
Driver date:	6/1/2024
DirectX version:	12 (FL 12.1)
Physical location:	PCI bus 1, device 0, function 0
Dedicated GPU memory	3.3/4.0 GB
Shared GPU memory	0.6/3.9 GB
GPU Memory	3.9/7.9 GB