[CUDA] cuMemPrefetchAsync Error

Question

[CUDA] cuMemPrefetchAsync Error

JiaweiRen2022 opened this issue 6 months ago · comments

What happened?

I want to compile and run an example of matrix multiplication.
It compiles successfully in version A (commit id :5abc05fd23efb109a2bf0170f47fd73cd01e2dad) , but I get an error when running it. The error message says it is a arguments parsing error. as follows :

iree/runtime/src/iree/hal/drivers/cuda/cuda_allocator.c:339: INTERNAL; CUDA driver error 'CUDA_ERROR_INVALID_VALUE' (1): invalid argument; parsing value '40x60x90xf32=1'

This error is not present when running it on a newer version(commit id : 65817331c95a48ac3f7d972667f4bd2c35b8f61e).
I debugged and found that the error occurred during the initialization of the gpu memory cuMemPrefetchAsync .
However, I couldn't find the exact change that caused this error to be fixed.

Who can help me locate the position of this update? (According to the commit ID, it seems that the updates were made between April 17th and July 11th.)

Steps to reproduce your issue

Input Test Case:

module attributes {torch.debug_module_name = "matmul"} {
  func.func @forward(%arg0: tensor<40x60x90xf32>) -> tensor<40x60x80xf32> {
    %0 = "tosa.const"() {value = dense<2.0> : tensor<40x90x80xf32>} : () -> tensor<40x90x80xf32>
    %1 = "tosa.matmul"(%arg0, %0) : (tensor<40x60x90xf32>, tensor<40x90x80xf32>) -> tensor<40x60x80xf32>
    return %1 : tensor<40x60x80xf32>
  }
}

old version cmd ：

./iree-compile test.mlir --iree-input-type=tosa  --iree-hal-target-backends=cuda  -o a.vmfb

./iree-run-module --module=a.vmfb --device=cuda  --function=forward --input="40x60x90xf32=1"

old version result :

iree/runtime/src/iree/hal/drivers/cuda/cuda_allocator.c:339: INTERNAL; CUDA driver error 'CUDA_ERROR_INVALID_VALUE' (1): invalid argument; parsing value '40x60x90xf32=1'

new version cmd:

./iree-compile  test.mlir  --iree-hal-target-backends=cuda -o  a.vmfb

iree-run-module --module=a.vmfb --device=cuda --function=forward --input="40x60x90xf32=1"

new version result:

EXEC @forward
result[0]: hal.buffer_view
40x60x80xf32=[[180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180][180 180 180 180 180 18 .......

What component(s) does this issue relate to?

Runtime

Version information

old version commit id: 5abc05f

new version commit id: 6581733

Additional context

No response